Categories
Social Media

Why Facebook, Instagram, and WhatsApp All Went Down Yesterday

The problem relates to something called BGP routing, and it took down every part of Facebook’s business.

A FACEBOOK, INSTAGRAM, WhatsApp, and Oculus outage knocked every corner of Mark Zuckerberg’s empire offline on Monday. It’s a social media blackout that can most charitably be described as “thorough” and seems likely to prove particularly tough to fix.

Facebook itself has not confirmed the root cause of its woes, but clues abound on the internet. The company’s family of apps effectively fell off the face of the internet at 11:40 am ET, according to when its Domain Name System records became unreachable. DNS is often referred to as the internet’s phone book; it’s what translates the host names you type into a URL tab—like facebook.com—into IP addresses, which is where those sites live.

DNS mishaps are common enough, and when in doubt, they’re the reason why a given site has gone down. They can happen for all kinds of wonky technical reasons, often related to configuration issues, and can be relatively straightforward to resolve. In this case, though, something more serious appears to be afoot.

“Facebook’s outage appears to be caused by DNS; however that’s a just symptom of the problem,” says Troy Mursch, chief research officer of cyberthreat intelligence company Bad Packets. The fundamental issue, Mursch says—and other experts agree—is that Facebook has withdrawn the so-called Border Gateway Protocol route that contains the IP addresses of its DNS nameservers. If DNS is the internet’s phone book, BGP is its navigation system; it decides what route data takes as it travels the information superhighway.

“You can think of it like a game of telephone,” but instead of people playing, it’s smaller networks letting each other know how to reach them, says Angelique Medina, director of product marketing at the network monitoring firm Cisco ThousandEyes. “They announce this route to their neighbor and their neighbor will propagate it out to their peers.”

It’s a lot of jargon, but easy to put plain: Facebook has fallen off the internet’s map. If you try to ping those IP addresses right now? “The packets end up in a black hole,” Mursch says.

map
A map shows where Facebook is unreachable due to DNS resolution failures—basically, it’s everywhere, all at once. COURTESY OF CISCO THOUSAND EYES

The obvious and still unresolved question is why those BGP routes disappeared in the first place. It’s not a common ailment, especially at this scale or for this duration. During the outage, Facebook didn’t say beyond a tweet that it’s “working to get things back to normal as quickly as possible.” After service came trickling back late Monday afternoon, it sent a statement that still lacked any technical detail. “To everyone who was affected by the outages on our platforms today: we’re sorry,” the company said. “We know billions of people and businesses around the world depend on our products and services to stay connected. We appreciate your patience as we come back online.”

The internet infrastructure experts who spoke to WIRED all suggested the likeliest answer was a misconfiguration on Facebook’s part. “It appears that Facebook has done something to their routers, the ones that connect the Facebook network to the rest of the internet,” says John Graham-Cumming, CTO of internet infrastructure company Cloudflare, who stressed that he doesn’t know the details of what happened. After all, he says, the internet is essentially a network of networks, each advertising its presence to the other. For once, Facebook has stopped advertising.

Which also means that more than just Facebook’s external services are affected. You can’t use “Login with Facebook” on third-party sites, for instance. And since the company’s own internal networks can’t reach the outside internet, its employees reportedly can’t get much done today either. (Instagram CEO Adam Mosseri even tweeted that “it does feel like a snow day.”)

That could also help explain why it’s taking so long to get back up and running. In 2019, a Google Cloud outage prevented Google engineers from getting online to fix the Google Cloud outage keeping them offline. It seems at least possible that Facebook is stuck in a similar catch-22, unable to reach the internet to fix the BGP routing issue that would let it reach the internet.

The good news is that once Facebook is able to revert whatever configuration got it into this, it shouldn’t take long to be back in business. “When it’s corrected, the traffic will really start flowing,” says Medina.

See What’s Next in Tech With the Fast Forward Newsletter

From artificial intelligence and self-driving cars to transformed cities and new startups, sign up for the latest news.Your emailSUBMITBy signing up you agree to our User Agreement and Privacy Policy & Cookie Statement

Meanwhile, the rest of the internet has felt Facebook’s absence. Or, more specifically, DNS resolvers like Cloudflare—services that convert those domain names into IP addresses—have seen as much as double the usual amount of traffic, as people keep trying to load Facebook, Instagram, and WhatsApp to no avail. Those requests aren’t enough to overwhelm the system, but the surge is a reminder of just how interdependent, and sometimes fragile, the internet really is.

“It’s not so much the dramatic story of the whole internet could fall over, or some nonsense like that,” says Graham-Cumming. “It’s more that it’s an interconnected system and it stays up partly because of technical things and partly because of people who keep an eye on it day and night.”

SOURCE