Friday, July 07, 2006

 

Shuttle SN25P and faulty SATA

About six months ago, my wife bought a Shuttle SN25P and put together a pretty nice system. It has a decent Athlon 64, 2GB RAM and the 74GB WD Raptor which have all been performing pretty well, except for a couple of things. She took the system to Dallas and found that when powering it up, it couldn't find the HDD. After reseating the SATA and power cables, it started behaving normally. During another recent trip, I powered down her machine to save power, only to find that it wouldn't power back up with the same issue. HDD access seemed to be behind everything. There were occasional lockups that left the HDD light on solid, and occasional bouts of not being able to find the HDD. So, given the really slick engineering of this system, my first suspicion was the Raptor. I pulled down their data lifeguard suite and took a look at the SMART readings - all were nominal. That would be odd, I think, on a system that showed every symptom of a gradual HDD failure - the exact sort that SMART was designed to detect.

Fast forward to two weeks ago. Tracey indicated to me that the computer seemed to be in a major hurt. I powered it up and Windows XP started booting to safe mode. It threw an error about not being able to read the registry and seemed to hang (HDD light on again). Now, booting shows an error - the SYSTEM registry hive is completely gone. This is one of the worst things that can happen to a Windows machine, because at this point the best she can hope for is to salvage the data on the drive. Installing a clean SYSTEM registry hive will wreck many of the installed programs, probably to the point of being virtually irreparable.

I had a hunch that it might be the motherboard so I did a quick Google for SN25P SATA problem and hit paydirt. Shuttle is shipping these with, of all things, defective SATA cables. Post after post describes exactly the symptoms she was seeing. It is highly frustrating to think that a $1 cable replacement could have prevented the loss of her system. Then again, it's also really difficult to fathom a defective cable in a PC. I realize that these little cables are transferring tons of information but to see a cable failure on an internal component... it's beyond rare. So, my opinion of Shuttle has gone down a little bit. Not their Engineering, but their Support.

One of the most serious problems facing any support organization is how to communicate defects to customers. I worked support for a large ISV for several years. We actually had a policy that words like defect or bug could not be used. (This mindset was later changed to a certain degree.) So, here are the possibilities as far as Shuttle support is concerned.

Shuttle support really might not know this is a widespread problem. My gut reaction is to see this as a technology problem. Their case management system could be poor or nonexistent. What seems like an oddball problem or random defect to one customer should be easy to identify as a more widespread problem affecting larger numbers of customers in the case management system. The reality though is that this would be more of a process problem. Serious corporate IT/helpdesks have a set of standards called ITIL. I think it is a failing of the software and hardware industries to apply these ITIL practices to Tech Support. Basically, the relevant sections of ITIL define a customer support request as having two components: a support request ID and a problem ID. For instance, if 10 people wrote in about SATA problems on their SN25P, ultimately this should result in 10 different support request IDs and a single problem ID. That's the first step towards determining the real scope of a problem.

I believe Shuttle support does know about SATA cable problems on the SN25P. There are any number of reasons why they might not acknowledge it. The most optimistic is that the problem isn't widespread enough to note it as a defect. A defect that affects an immeasurably small portion (for instance, 0.01%) of your customers simply isn't cost effective to research, publicize, and fix. This may upset you but then again, you wouldn't have purchased a Shuttle if you had to pay for the Engineering department to chase down every single issue experienced by every customer. The market would bankrupt them. Another possibility is that Shuttle support is not well connected to Engineering. There is a natural disdain between a support department and an Engineering department that can result in this type of problem going unresolved. I imagine this might be amplified by support being in the USA and Engineering being in Taiwan. Companies fear revealing serious defects may provide fodder to competitors. The support hierarchy (1st, 2nd, and 3rd line) may be ineffective at communicating this type of issue.

In any case, I'm replacing the SATA cable now.

Update: It's more than a month later and no lockups. It really was the SATA cable.

This page is powered by Blogger. Isn't yours?