Outages at The Planet

The web hosting industry is a tough industry. There an infinite number of potential technical problems that can occur, ruthless competition, somewhat low margins, and then to top it off, trucks and explosions.

San Antonio, Texas-based Rackspace dealt with a truck knocking its power out (and subsequently quite a few servers). Most recently, though, The Planet (another large dedicated hosting company and an indirect competitor of Rackspace), had a portion of their datacenter explode.

The cause of The Planet’s issues are fairly simple: an electrical system shorted and caught on fire. The fire caused an explosion that “knocked down three walls” that surrounded the datanceter’s electrical equipment room. The power for most of the datacenter then went out. The fire department, after inspection the damage, told The Planet that they were not allowed to turn on their backup generators for safety concerns. 9,000 servers and 7,500 customers were affected. Needless to say, it was a large outage.

The company started responding and posting updates almost immediately on their forums. The first significant update came about four hours after the incident was first mentioned. An hour after the incident was first reported, employees promised to post updates every 15 minutes on the forums (a promise they kept for the most part).

Something The Planet did really well was keep up with the updates. Even if they had to post “there are no additional updates at this time,” they still kept their customers in the loop. The company then developed a plan of action and reminded customers that they fully intended to keep to their SLA promises and commitments. As it got later into the night, the company started posting updates less frequently, but they never stopped posting updates. They brought in additional teams and support technicians to help fix the problems and man the phones. 28 hours after the issue first occurred, the company’s CEO posted an update on the forums that briefly and effectively communicated what was going on and what was going to happen.

I really liked how The Planet communicated their priorities (restoring service) very clearly. I also like how they provided updates every 15 minutes while they were learning about the issue and what it meant for customers. Their temporary web site was effective at providing updates as well.

While still keeping what they did do well in mind, The Planet also did some things I did not particularly care for. Posting about sales and promotions while a good portion of a large datacenter is down is inappropriate. That has the potential to annoy a lot of customers and I’m sure the company can afford to hold off for a day or two on the promotions. The company’s official blog has yet to mention the outage. Their web site doesn’t talk about it, either. More importantly, neither of them mentioned the outage while it was still happening (all services were fully restored as of a few hours ago).

Overall, the company handled the issue well. What The Planet does over the next few days will determine what a lot of customers and a lot of the web hosting industry thinks of the company.

Technorati Tags: , , , , ,

One Response to “Outages at The Planet”

  1. Service Untitled» Blog Archive » Web Host Customer Service said:

    Mar 11, 10 at 10:38 am

    […] how the complete process is handled that ultimately makes the difference. The customer can get over the site being down, but he will not get over being ignored, and response time needs to be […]