I was asked last week what the continuous uptime was of my data center. I knew generally, but did some digging and some math and learned that it was 125,000 + hours of continuous uptime.
What really got me though was how little time and effort and coverage there was on the number. It got me thinking - with all of the coverage and articles and studies on efficiency, why the lack of focus on the only thing that matters when the shit hits the fan? Do you really think that in the middle of an outage that companies are freaking out over how awesome their PUE is? Me neither.
So I did a blog post related to this which was the fact that we already have the ultimate yardstick when measuring our data center spend and the stuff that goes into it - cash. We can easily see if something is more or less than something else, and decide if we want to spend more or less, buy on value vs. price, etc.
So why is there so little time spent on what the uptime number is?
Well, if yours sucks, you want to be looking at other measurements that paint you in the most positive light, duh. My other hunch is that PUE (as a widely discussed example) is one of those nebulous numbers that is open to debate on how you measure it, and keeps the dust in the air so you don't see the only thing that matters - Uptime.
Why do I say it's the only thing that matters?
Because it is the bedrock on which SLA's are written. I was chairing a panel at a BisNow event in Virginia and I asked a question to the panel - What is the one thing you want to tell vendors in the audience today? - Mike Manos from AOL had the best response - 'Vendors, do not hand me an SLA that says 100% uptime with maintenance windows in it. 100% is either 100% or it's not.'
So let's focus on uptime because it's as easy to use a measurement tool as cash and can be used for applications, hardware, and the data center itself.
Who's with me?