Skip to main content

99.9999% How many nines are enough in mobile?

All of us have probably heard of the saying "Five Nines" which means 99.999% system availability and is the mythical reliability target often quoted as a goal to achieve when running a computer system or service. There is a larger debate on what the number means, if it is only the "network" or if it should include applications, servers, etc.

I'm not going down that debate path today other than to state the obvious that the "more nines" the better. Instead I would like to use the standard availability table to describe one of the hidden realities that currently exists with mobile services. So first let's start with the table:

99.999% is 5 minutes of downtime per year, and 90% is 36.5 days of downtime per year.

So now the question is:

Would it be acceptable for a production mobile service to only be available 90% of the time?

My company has seen examples of this low level of service from even the most well known of companies. One of the most problematic areas seems to be SMS services and especially short code programs.

A short code is where you text message a special keyword or phrase like "Pizza 98065" to a number similar to a cell phone number. This number routes your keywords to an application which then returns some form of answer back to your handset. There are many examples of short code text messaging programs for looking up stock quotes, checking the weather, looking up an account balance, checking your airline flight, etc.

I think most people would have a "reasonable expectation" that when they send out a text message to a short code, some form of answer will be returned to them in a time-frame that is useful. If you are trying to look up your checking account balance so you know if it is possible to use your debit card in order to make a purchase, most people would expect the answer to come back in a few minutes or less, not hours from when the request was sent.

My company is able to monitor the performance and availability of any type of short code program. We were very surprised to see examples of short code services from major/giant/well-know companies where the success rate of the short code request is at the 90% level or lower. This means that over the course of the year, that service isn't working for over 1 full month of time.

To be fair, the types of common problems we are seeing aren't always coming from the actual application behind the short code, but instead problems are coming from the SMS aggregators processing the messages. The SMS aggregator is the intermediate party (company) sitting between the network operator (AT&T/Sprint/Verizon/T-Mobile) and the actual application that processes the keywords recognized by the short code.

When you vote via text message on a TV show, your text message flows from the network operator to a SMS aggregator, which then routes your text message to the owner of the application processing your vote. The message you get back saying "Thanks for your vote...." follows the reverse path, from the application owner, to the aggregator, and then back on to the operator network and then finally to your mobile device. There are many other examples of short code promotions that follow this same model.

One of the most common problems we have seen is that the message reply "never comes back", which means you sent your text to a short code but you never receive any type of reply in a reasonable amount of time. Your message has gone into "limbo".

Here is an example graph from a short code service that allows you to submit general information queries via a short code. The graph lines "dipping down" toward the bottom indicates that the success rate of the service is dropping. In this case the last 1 weeks worth of data shows that this service is only successful 58% of the time on average. And it is from a service that just about all of us would recognize.

We are currently advising our customers not to assume that their SMS services are running at 99% or higher. In reality, very few of the ones we have seen are running at this level. Many are running down in the 90% range and a few like the one I have shown here has major problems that need to be fixed.

If you are concerned about the availability of your mobile services, it is important to develop some type of strategy that will give you visibility into what is happening in the real world. I'm sure services like the one above went through extensive QA testing, but once a service is released out into the hands of real users, you might get a different result than what you established during pre-production testing.

Comments

Popular posts from this blog

Apple vs Windows Font Rendering ::: Who gets the headache?

The Apple philosophy of showing screen fonts in a way consistent with printing is opposite of the Windows way of displaying screen fonts for readability. This issue has been discussed at length in the past, but because of the recent surge in popularity of Apple laptops and desktops, I wonder if this issue will resurface as more ex-Windows users switch to the Mac and find out that the fonts might look very different to them. I am a "dual mode" user. I have a XP machine I have to use at work, and at home I have a Mac. Since I was a Windows users for a long time before I started using the Mac, my brain is "wired" to think that the Windows way of showing the fonts looks best. My XP machine is a Thinkpad X60. The Thinkpads have always had really great laptop displays that are really clear and easy to read. Because of this, I run Windows with no font smoothing. To me it looks "pixel perfect" and I can stare at the screen all day with absolutely no eye strain. W...

R.I.P. "Stormy the Greyhound" 2002-2009

Stormy passed away Tuesday morning. Over the last month he was slowly deteriorating, and we finally learned he had a form of cancer that couldn't be treated. He was still "functioning", but he was no longer eating, and he didn't have much strength left. So we decided it was best to do the right thing before he experienced a lot of pain, or could no longer stand or walk. This is one of the recent pictures I took of him with my mobile phone a few weeks ago. He was sick, but his natural easy-going self still did shine through. This is how I want to remember him. I could write a book about him and Greyhound behavior, but for now, I just want to recount two days in the time he was with us. The first day we brought Stormy home, the realization of what it meant to have a Greyhound became apparent. Stormy went from the race track, to the Greyhound shelter kennel, to our home. He had never been in a house before! He didn't even understand what it meant to walk up a small s...

Dying of Thirst

I'm staying at a hotel for a few days and after a late night jog I needed a bottle of water. The vending machine on my floor has one listed for $1.75. I open my wallet, pull out one dollar stick it in. I pull out the second dollar put it in. The machine spits it back out. A small corner of the dollar is missing so the machine doesn't accept it. I look in my wallet, no more dollars. I hit the coin return which gives me 4 quarters. I go back to my room find 3 more quarters. Now I have seven quarters and one dollar bill with a missing corner. Back to the machine I go. I put in 1, 2, 3, 4, 5, 6 quarters, everything good so far. I put in the seventh. It falls through the coin return. I try again, same result. Thinking this quarter has a physical defect that the coin mechanism in the vending machine is picking up, I return to my room again looking for another quarter. Back to the machine: 1, 2, 3, 4, 5, 6....7 " plink ", it falls through the coin return. Aaarrrggg ! I hit t...