Wednesday, September 02, 2009

GMail outage caused by maintenance

Routine maintenance was behind an outage that affected millions of GMail users on Tuesday, Google has said in a statement. Google confirmed its e-mail service GMail was unavailable to the "majority" of its 150 million worldwide users for nearly two hours on Tuesday but the cause of the outage had been identified.

It is not the first time GMail has suffered from an outage. In May a major technical issue affected millions who were unable to use GMail, Google News or its main search engine. Similar outages also occurred in February and March this year. The search giant has also been the target DDoS attacks and restrictions to its service in a number of countries. In June this year China blocked both the search engine as well as GMail services.

On its official blog Google's Engineering Director David Besbris said, "We know many of you are having trouble accessing Gmail right now - we are too, and we definitely feel your pain," The free GMail service has been ranked as the world's third most popular e-mail program, behind services provided by Microsoft and Yahoo.

Users around the globe first experienced problems at 19:45 GMT on Tuesday with both email and chat facilities being affected. According to the company the "widespread outage" lasted about 100 minutes. Google said it took the matter extremely seriously given that many people relied on the service for both personal and professional communications. "I'd like to apologize to all of you — today's outage was a Big Deal, and we're treating it as such," said Ben Treynor, Vice President of Engineering and Site Reliability Czar. Following a thorough investigation Google said it was compiling a list of things that needed to be fixed or improved.

Part of the problem lay in routine maintenance. After taking a small fraction of GMail's servers offline to perform routine upgrades increased traffic effectively overwhelmed the system, Google said. A few of the request routers became overloaded and in effect told the rest of the system "stop sending us traffic, we're too slow!". This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded. IMAP/POP access and mail processing continued to work normally because these requests do not use the same routers.

To prevent similar problems occurring again Google says it was "increasing request router capacity well beyond peak demand to provide headroom". as well as looking at "more subtle" actions. Despite the glitch, Google has insisted Gmail remains more than 99.9% available to all users, and that events such as this were notable for their rarity. Google moved out of Beta in July signalling a more stable and reliable service. However, this recent outage, given its widespread impact, will shake confidence for both current and prospective users of the service.

No comments: