Wednesday, August 21, 2013

Google remains silent over outage

Several days after Google's entire infrastructure became unavailable for a full 5 minutes, there is still no clear picture as to what happened. Google have yet to make any statement on the matter, and there silence is in itself creating a breeding ground for speculation.

While Internet dropped by a  massive 40% during the outage, not everyone would have been affected, or even have noticed. The outage occurred at around 15:55 PDT meaning that while much of America might have been affected by the glitch those in Europe would have likely been hitting the sack. Even many parts of Asia would likely not have seen any problems given the fact that the sun may only just have been peeking over the horizon.

Financial impacts

The down time is significant nonetheless since this was not just a run of the mill website. This was Google, a multinational Internet giant which stores petabytes of data for millions of users around the world. In business, time is money and even a few minutes of downtime can result in lost trade resulting in financial losses.

Google is said to have lost around $500,000 during the minutes-long window according to one marketing executive. However, such a sum is relatively insignificant for a company that generates $40 billion. Furthermore spike of web activity that occurred once services were restored likely replaced any losses.

For smaller companies the losses were potentially more significant. People searching for products may have failed to navigate through Google searches or even Google's own Adsense and Adwords programs.

Safety concerns

The most concerning aspect for some is the fact that Google went down at all, and whether data stored on Google's servers is safe.

The fact that Google managed to bring everything back online so quickly should be of some comfort. In fact some have expressed surprise that outages don't occur more frequently.

"I'm not surprised Google drops off the planet for 5 minutes," wrote one commentator, "I'm surprised it doesn't happen more often, and I'm astonished they get it back online in 5 minutes."

Whatever caused the problem, Google's recovery programs certainly proved to be a lot faster that those seen by certain large banking giants, such as NatWest which saw its computer systems crashing for nearly a week in 2012 [BBC], or other Internet companies including software giant Microsoft which has had its fair share of outages, some lasting several days.

Up time guarantees

Google offers paying customers of its services 99.9% up time and offers a certain amount of free days of service in each billing cycle if it cannot meet the level of availability. And while last week's outage was significant Google is doing better than some of its rivals in providing a reliable cloud service. In fact in 2011 the UK's advertising agency began investigating Microsoft regarding its claims that the software giant "can guarantee 99.9 percent uptime" of its Office 365 cloud service [ZDNet]. Microsoft saw outages lasting several hours in 2011 [ZDNet] and in 2012 [CNET].

A guarantee of 99.9% up time amounts to a possible outage of 8.76581277 hours, a far cry from the 5 minutes or so Google was down on Friday last week. In fact that amounts to less than 0.001%, equating to an up time guarantee of 99.999%. However, promising such guarantees would inherently increase running costs.

Previous outages

Google has suffered outages before, though they have generally only affected single services. In April 2012 its Gmail service went dark leaving more than 35 million without access to their email [tvnewswatch - April 2012] In some ways the effects were far more significant in that the effects lasted for nearly 2 hours. Prior to the April 2012 crash, Gmail had only been down seven times since its 2005 launch. At that time Google reported that Gmail was available 99.984% of the time in 2010, and 99.99% in 2011.

The year before and bloggers were left unable to post or access their accounts after Google's Blogger service went down for a full 20 hours [tvnewswatch - May 2011]. Concerning the cause Google only said they had "experienced some data corruption" during scheduled maintenance.


Google's silence over this most recent outage has prompted all sorts of theories. Most analysts were confident the outage was not the result of hackers, but many posted jokes on Twitter about the cause.

"Google went down because it was told it could no longer have 20 percent time and didn't like it," wrote Danny Sullivan, editor of the Search Engine Land blog, referring to the time Google employees are given to work on projects outside their job description.

"Somebody in Mountain View probably unplugged something, then plugged it back in," Greg Sterling, a researcher with Sterling Market Intelligence, wrote. On newspaper websites some joked that Skynet had just come online, a reference to the film the Terminator where computers become self aware and wage war against mankind. Others suggested that Google may have done it deliberately to "prove a point".

There were, however, more serious suggestions. Greg Sterling, a researcher with Sterling Market Intelligence, told The Financial Times, "This individual outage doesn't matter. The idea that Google could go down is unsettling to people but it doesn't create a problem for the company unless it starts to happen more frequently."

Most analysts said the black-out was unlikely to have been caused by hacking, but said it could not be ruled out. Most seemed to think the outage was caused by a physical infrastructure problem or human error. But this in itself unsettling.

Single point of failure?

Matt Oxley, head of creative technology at Tribal Worldwide, told Sky News, "To have a complete outage points to a single point of failure, but what are the single points of failure in a multi-national organisation such as Google?"

"The simplest and most obvious would be the Domain Name System (DNS). Did someone manage to hack/attack their DNS? Or did someone make a mistake and push an update (internal or external) that caused a cascading error through their name servers?"

"The other possibility is they do actually have a single point of failure in their infrastructure that has been exposed either by external or internal circumstances (attack or mistake). If it was either of these cases then they won't want to publicise it before they fix the error. If it is the latter, then it could be embarrassing and damaging to their business/reputation."

Perhaps by staying quiet Google believes the whole issue will blow over and be forgotten

More reports:  Sky News / FT / Forbes / Daily Mail / CNET / ITProPortal / Marketing Pilgrim / ZDNet

tvnewswatch, Kunming, Yunnan, China

No comments: