Facebook: six hours of total outage due to a simple update

“Internet is cut, I have nothing more that works?” Is perhaps a question you heard last night. In reality, all Facebook services were inaccessible: the social network, Instagram, Messenger and WhatsApp which are among the most used services in the world.

If some thought of an attack, it was not. It was indeed a failure, following the deployment of a maintenance update for the routers of the company. It was so huge that even the internal equipment was inaccessible, blocking the technicians in particular because their badges were no longer validated by the security.

What happened? The update resulted in the removal of all Border Gateway Protocol (BGP) routes. BGP is at the heart of the Internet’s architecture, since if we assume that it is a network of networks, BGP is its glue: it ensures visibility between the elements. Without these routes, the company’s DNS servers were made inaccessible and the rest of the world could no longer know where to find Facebook services.

The cascade of problems started around 17 h 40 (French time): the BGP routes disappear and the DNS servers of Facebook are no longer responding. Quickly everything is linked and all the services become inaccessible, whatever the platform. Cloudflare , said to have noticed that the DNS resolver 1.1.1.1 could no longer resolve facebook.com, to the point of ask if there was not a problem on his end. The teams quickly understood, however, that the failure came from Facebook.

This failure has multiple consequences for other services and infrastructures, including those from Cloudflare and Google. The house DNS, respectively 1.1.1.1 and 8.8.8.8, record colossal spikes in requests, users attempting to use them to access Facebook services. Activity is up to 30 times higher than normal, with many failures

In its report on the incident, Cloudflare also indicates a significant increase in requests to other services, like Twitter, Signal or even TikTok. Signal has confirmed last night , tweeting that millions of people had just landed on the service, with some problems to the key.

Everything began to slowly return to order around midnight. A short time later, Mark Zuckerberg posted an apology : “ Sorry for the interruption, I know how much you rely on our services to stay connected with the people you care about ”. In a short post , Facebook confirms that everything is back to normal and renews its apologies.

The company indicates that it will still take time to analyze the consequences of this failure, and ensure that the conditions that have spawned no longer reproduce. This failure has the merit of putting a raw light on Facebook as SPOF (Single Point Of Failure). Not only Messenger and WhatsApp are the two most used messaging services in the world, but many other services use or offer Connect to simplify authentication.

The fact that the disappearance of a single actor has such repercussions invites reflection.

Back to top button