Lessons from Rackspace

Rackspace had a really bad week last week. A truck literally crashed into a part of their power system (which from my understanding was already having a few problems that day) and their entire datacenter in Fort Worth, Texas was taken down because it was at risk of overheating. When an entire Rackspace datacenter goes down, a lot of web sites go down – many of them well known, an even great number mission critical to the client’s business. All of their customers pay a lot of money and have high expectations of Rackspace, so the situation was tough.

Rackspace, though, handled a very tough situation very well. The outage got a lot publicity because many well known web sites were down for a period of time while Rackspace was working on restoring the service.  A lot of blogs reported on the outages during and after the downtime.

From what I can tell, Rackspace did a fairly good job at handling the outage and its related affects. I’m not a Rackspace customer, but am pretty familiar with the company (see my interview with an executive from the company here). During the outage, they follow the guidelines I suggested for “keeping customers in the loop” almost perfectly.

Firstly, the company setup a dedicated blog to update customers during and after the outage.  The blog doesn’t have an archive, so I can’t tell what they posted as the outage was going on, but their follows up are good. They’ve posted in depth information about what happened, what’s being done to fix it, etc. They posted a very nice timeline explaining what happened as well.

The company’s CEO, Lanham Napier, has made himself very visible. A majority of the posts (and a video) are from him. The CEO getting involved and talking to customers is important.

The company also kept customers in the loop via their account portal (where they submit tickets, etc.) and through the points of contact for the various customers. The company kept their employees in the loop, which is critical. That way, when customers called, they didn’t have to hear the “we know nothing” excuse that many companies (especially ISPs) give when there is an outage.

Here are some things they can still do / should have done:

  • Had perspectives on the blog and in press releases from people besides the CEO (who didn’t do poorly at all). Different perspectives help. IT people might want to read a more technical explanation than the sales people.
  • Briefed journalists and their customers more frequently – the more information to the more people, the better.
  • Offered refunds or credits to the affected customers. (I’m not 100% sure if they did this.)
  • Follow up with customers in a month or two and make sure they are happy.

Overall, Rackspace did a good job at managing the events. It only turned out to be a couple of hours of downtime, but that is a couple of more hours a month than Rackspace customers are used to.

One Response to “Lessons from Rackspace”

  1. Service Untitled » Outages at The Planet - customer service and customer service experience blog said:

    Jun 02, 08 at 5:20 pm

    […] low margins, and then to top it off, trucks and explosions. San Antonio, Texas-based Rackspace dealt with a truck knocking its power out (and subsequently quite a few servers). Most recently, though, The Planet (another large dedicated […]