Facebook insiders claim the tech giant’s seven-hour outage that took all of its services offline was exacerbated by employees working from home as staff were locked out of remote messaging systems and company buildings.
The massive outage on Monday was a minor annoyance for casual Facebook users, and a grave concern to the countless small businesses that rely on the platform and its subsidiaries WhatsApp and Instagram as vital channels to communicate with customers.
Facebook is also used to log in to many other apps and services, leading to unexpected issues accessing shopping accounts, dating apps, smart home devices, and a wide range of other services.
For Facebook employees themselves, the situation was dire. Striking just after 8.30am in California, the outage left employees arriving at the Menlo Park headquarters locked out as the company’s access card system went down, according to multiple reports.
For the roughly three-quarters of Facebook employees working remotely, the company’s internal messaging system Workplace was also knocked offline by the outage, leaving a skeleton crew at the company’s main Santa Clara data center cut off from assistance as they raced to debug the network servers.
Teams dispatched to the main data center also reportedly had difficulty accessing secure rooms to revert faulty configurations on key systems.
Jonathan Zittrain, director of Harvard’s Berkman Klein Centre for Internet and Society, said in a tweet: ‘Facebook basically locked its keys in its car.’
Facebook has said that the historic outage was caused by ‘configuration changes on the backbone routers that coordinate network traffic between our data centers.’
It is believed that a faulty update to Facebook’s Border Gateway Protocol (BGP), which routes traffic between large private networks and the public Internet, left apps and browsers unable to locate the company’s services.
Engineers were rushed to the company’s data center in Santa Clara, California (pictured), to reset the servers manually after the company’s internal services went down
A person claiming to be a Facebook employee said on Reddit that high numbers of staff working from home made the problem worse. The account was later deleted
Facebook has led the way in allowing employees to work remotely during the pandemic, and one purported insider said on Reddit that the Santa Clara data center had lower staffing due to pandemic restrictions.
‘The nature of the problem meant Facebook would have needed network engineers to physically access their BGP routers – and due to the pandemic, some of the data centers quite possibly don’t have an engineer based on site, or someone who could have immediately started to work on the problem,’ Kieron Harding, an IT Infrastructure Engineer at GRC International Group, told DailyMail.com.
‘One of the reasons why the outage lasted for as long as it did was because the misconfiguration of the BGP also affected Facebook’s physical door access systems – which shut down; meaning engineers couldn’t get into the buildings, or secure rooms, to start fixing the issues straightaway,’ said Harding.
Harding and other experts have said an inadvertent mistake or sabotage by an insider were both plausible scenarios.
The failures of internal communication tools and other resources that depend on that company’s network compounded the initial error, several Facebook employees told Reuters.
‘We want to make clear at this time we believe the root cause of this outage was a faulty configuration change,’ Facebook said in its blog post. ‘We also have no evidence that user data was compromised as a result of this downtime.’
The Facebook outage is the largest ever tracked by web monitoring group Downdetector.
The tech giant has around 60,000 employees globally and announced in May that it would be operating with a 25 per cent capacity in its offices once it reopened in July.
CEO Mark Zuckerberg announced in June that the company would allow all full-time employees to work from home if their jobs can be done remotely, and has said he expects half the company’s to work remotely over the next decade.
Data center technicians and other who work of physical hardware would be exempt from remote work, Zuckerberg said.
New York Times reporter Sheera Frenkel said in a tweet: ‘Was just on phone with someone who works for FB who described employees unable to enter buildings this morning to begin to evaluate extent of outage because their badges weren’t working to access doors.’
Meanwhile a purported insider on Facebook’s recovery team said on Reddit: ‘There are people now trying to gain access to the peering routers to implement fixes, but the people with physical access is separate from the people with knowledge of how to actually authenticate to the systems and people who know what to actually do, so there is now a logistical challenge with getting all that knowledge unified.
‘Part of this is also due to lower staffing in data centers due to pandemic measures.’
Employees at the company’s Menlo Park (pictured) campus had trouble entering buildings because the outage had rendered their security badges useless
Facebook explained that Monday’s outage was caused by a faulty update that was sent to their core servers, which effectively disconnected them from the internet.
Engineers were rushed to the company’s data centers in Santa Clara to reset the servers manually, but it took until 2.45pm Pacific Time (10.45pm GMT) for them to be reconnected due to the ‘logistical challenge’ of employees who could offer assistance being cut off and at home.
Employees use Facebook services to communicate with each other and its internal messaging platform Workplace was also down, leaving many unable to do their jobs and discuss how to fix the issue while working from their homes.
Facebook engineers said in a statement: ‘The underlying cause of this outage impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem.
‘Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication.
‘This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.’
Zuckerberg apologized for the outage in a Facebook post on Monday, writing: ‘Facebook, Instagram, WhatsApp and Messenger are coming back online now. Sorry for the disruption today — I know how much you rely on our services to stay connected with the people you care about.’
‘To every small and large business, family, and individual who depends on us, I’m sorry,’ Facebook Chief Technology Officer Mike Schroepfer tweeted, adding that it ‘may take some time to get to 100%.’
The outage was the second blow to the social media giant in as many days after a whistleblower on Sunday accused the company of repeatedly prioritizing profit over clamping down on hate speech and misinformation.
Facebook, which is the world’s largest seller of online ads after Google, was losing about $545,000 in U.S. ad revenue per hour during the outage, according to estimates from ad measurement firm Standard Media Index.
Kevin Collier, an NBC news reporter, said: ‘Don’t yet know exactly what’s behind the DNS issue that’s knocked Facebook/Instagram/WhatsApp offline, but it’s really bad.
‘Pretty much everything that runs through those three companies are inaccessible. Employees can’t even enter conference rooms because they’re IoT (internet of things)!’
Facebook has 47 locations across North America but many are smaller data sites, while 15,000 people, around a quarter of the total workforce, are based in the Menlo Park headquarters.
Mark Zuckerberg has pledged to move to a working from home setup within the coming years and predicts that as much of half of the workforce will be remote within the next five to ten years.
The CEO said he would start ‘aggressively opening up remote hiring’, telling the Verge: ‘We’re going to be the most forward-leaning company on remote work at our scale.
Mark Zuckerberg has pledged to move to a working from home setup within the coming years
Facebook, Instagram and WhatsApp were all brought down for almost seven hours yesterday in a massive global outage. The US tech giant said the problem was caused by a faulty update that was sent to its core servers, which effectively disconnected them from the internet
‘We need to do this in a way that’s thoughtful and responsible, so we’re going to do this in a measured way. But I think that it’s possible that over the next five to 10 years — maybe closer to 10 than five, but somewhere in that range — I think we could get to about half of the company working remotely permanently.’
Monday’s outage caused Facebook shares to plunge 5 percent amid the outage, wiping some $48billion off its value – though the slide started before the tech problems, in-part due to a whistleblower accusing the company of putting profits before safety in a 60 Minutes program broadcast Sunday night.
It marks the firm’s second-worst day on the markets ever. Facebook stock rebounded in early trading Tuesday, rising more than 1 percent.
In addition to the stock market slide, Facebook likely missed out on at least $67million in direct revenue and possibly as much as $102million during the outage – based on average hourly earnings across 2020 and projections of its 2021 hourly earnings from Q1 and Q2 results.
Zuckerberg’s own stake in Facebook fell by an estimated $7billion.
Facebook was already in the throes of a separate major crisis after whistleblower Frances Haugen, a former Facebook product manager, provided The Wall Street Journal with internal documents that exposed the company’s awareness of harms caused by its products and decisions.
Haugen went public on CBS’s ’60 Minutes’ program Sunday and is scheduled to testify before a Senate subcommittee Tuesday.
Haugen had also anonymously filed complaints with federal law enforcement alleging Facebook’s own research shows how it magnifies hate and misinformation and leads to increased polarization. It also showed that the company was aware that Instagram can harm teenage girls’ mental health.
The Journal’s stories, called ‘The Facebook Files,’ painted a picture of a company focused on growth and its own interests over the public good. Facebook has tried to play down the research.
Former Deputy Prime Minister Nick Clegg, the company’s vice president of policy and public affairs, wrote to Facebook employees in a memo Friday that ‘social media has had a big impact on society in recent years, and Facebook is often a place where much of this debate plays out.’
Source: Daily Mail UK