On Sunday August 30th, 2020 CenturyLink suffered a major outage which affected numerous internet companies including Amazon, Twitter, Microsoft via Xbox Live, EA, Blizzard, Steam, Discord, Reddit, Hulu, NameCheap, OpenDNS, and many others including PhishingBox. One of the major companies affected was also Cloudflare who wrote about the outage on their blog saying, “Globally, we saw a 3.5% drop in global traffic during the outage, nearly all of which was due to a nearly complete outage of CenturyLink’s ISP service across the United States.”

According to a tweet from Andree Toonk, a Dutch Internet engineer and founder of BGPmon “The root cause of the @CenturyLink outage appears to be related to a Flowspec rule. (flowspec is a way to quickly distribute network ACLs via BGP)”. Effectively, Flowspec is an extension of BGP or Border Gateway Protocol, which allows companies to institute firewall rules across their network using the BGP routes. This method of firewall rule distribution is usually used when dealing with security incidents such as DDoS attacks. 

Andree Toonk also referenced the similarities between Sunday’s outage and one which CloudFlare had in 2013. In that outage, the engineers at Cloudflare caught a DDoS attack being launched and their internal attack profiler output that the attack packets were between 99,971 and 99,985 bytes long instead of the normal range of around 1,500 bytes. When the engineers pushed a new rule via Flowspec to Cloudflare’s edge network, the machines were caught searching for packet sizes that did not exist and ended up crashing due to RAM consumption. A video of the collapse of the network was captured via Sam Browne, a computer science professor at City College of San Francisco, using BGPlay to record the BGP sessions as they were withdrawn.

 

Both events are due to a single rule change that was quickly propagated by the engineers at their respective companies. There was no intended malicious actions and both teams were following standard procedure, which was later updated at CloudFlare and likely CenturyLink as well. However, the outages show the true risk potential for your network and business operations by a single employee or group of employees who could be compromised. Keeping your company safe with regular testing and training of employees to protect themselves and your organization against social engineering attacks could be the difference needed to defend against a system-wide crash