Wednesday, August 23, 2006


Mutating ITIL for ISVs and hardware manufacturers

In a previous post, I talked a little bit about ITIL could help an organization like Shuttle identify and resolve widespread issues that they may not find out about within the confines of traditional technical support. I want to expand on a few concepts. ITIL is a big standard. For now, I am focusing on a small subset - incident and problem tracking.

The basic concept is that every customer call is at least one incident (or case). Every incident is associated with a problem. In a theoretical model where we assume perfect knowledge, a problem would contain complete information about the error messages, other symptoms, associated defects, and resolution.

A few mutations occur to me right off the bat. First of all, at a large ISV, the support team for a major product can have up to five levels and hundreds of reps. All of these reps must be able to create and update problem IDs. The problem tracking (on top of the already standard incident tracking) cannot seriously impact the time needed to handle a call; both for economic reasons and compliance - a tech will not properly use a system that takes extra time out of their day.

So, the first hurdle is adding problem tracking to incident tracking systems. The most obvious solution would be to use a knowledge management or defect tracking system to handle this. Subsequently, only a minor modification to the incident tracking system would be needed. The time consuming factor may come into play here. How long will it take a tech to find a problem ID for a case? How long to create? Maybe most importantly, how long to update an existing problem? Finally, does everyone have rights to quickly update a problem?

I can give you a couple of pitfalls to watch out for when considering these existing systems. Knowledge management systems in a large organization are formal beasts. My company, for instance, requires every article (internal or external) to follow a three level approval process for publishing. There is a strict ownership chain which prevents others from modifying an article which you have checked out. Defect tracking systems may meet more requirements than a knowledge base. They generally allow multiple people to work on a defect. Both types of systems can be slow and rarely are they tightly integrated into the incident management system.

The requirements become more complex when you change from the perfect information model to a realistic model. When a customer calls in, it may be easy to identify an existing problem, or the problem may not be identifiable on the first call. In another scenario, maybe the product just shipped a new version and customers are calling in with crashes or a specific error message. So, the first time a customer gets an "incompatible version" error, for instance, the workflow should be something like this:
Move a week down the road and there are 25 cases associated with this problem ID. As things move on, it becomes clear that this error has multiple causes. In this case, the problem ID should be split into multiple problem IDs. The old ID could be retired and not applicable to new cases. Other management tasks need to be performed, such as marking a problem ID as a duplicate. To me, all this sounds like some kind of franken-wiki. Anyone should be able to open a problem, click edit, and type in their observations about the issue or add data (and remove outdated information).

The best investment a company can make into a problem tracking system is search. This is extremely hard to get right. Technical support searches are not normal searches. The search engine must be able to handle a search for "file not found", "18345926", or "*.*", and all of these queries should be treated literally. This means minimizing noisewords and considering punctuation in the query. Don't dare introduce a search engine that uses "or" by default since you aren't guaranteed to have technicians that know when to use and, or, or exact phrase. There are easy ways to improve search efficiency in the context of a problem tracking system. Relevancy can be determined in a few ways - first, is the problem "fresh", meaning, have cases been associated with it recently? How closely does the configuration data from the current incident match the problem? Is the problem still open and not a dupe or split? If I search for incompatible version, I would want the first result to be a problem "hit" in the last week by a customer using the same product version and OS as my customer is. Each relevancy point could be indicated graphically - perhaps 1-5 stars for freshness, configuration, product version, etc.

There are a lot of benefits that can be realized if such a system is tightly bound to the incident tracking system. For instance, at the point where 25 incidents are associated with a problem, there is a lot of data we can gain about the problem just from the natural relationship. What operating systems does the problem tend to affect? Does it typically affect large or small customers? How about new or inexperienced users? Has the problem existed in more than one version of the product?

On the macro scale, organizations should mine the heck out of these problem reports. Usability teams, developers and QA should all know exactly what the top 10% of the problems are and should be thinking about how to develop and test around them. In my experience, this already happens, but in a much less formal way. Companies develop and test around less concrete metrics, such as the top knowledge base article, or most referenced defect or error message. The ISV I work for analyzes the knowledge base search queries to find patterns. These techniques are all surrogates for real problem tracking.

Remedy has been a big proponent of ITIL for a long time, and their service desk product is billed as "an incident and problem automated workflow solution". My complaint about Remedy (and ITIL, for that matter) is that it is tailored to the much larger in-house help desk market. While ISV technical support has similar needs, the needs of the in-house help desk will always win with Remedy because that is where the vast majority of their money comes from. I haven't seen how their solution handles problems. Do you have any experience with Remedy? Have a problem with this idea? Comment below and we'll talk about it.

Monday, August 14, 2006


Redirect a port on Linux

Randomly I had a need to redirect a request from the local LAN to another machine on the local LAN. Don't ask... Actually this would also come in handy for, for instance, services which can only be configured to listen on (such as SSH dynamic port forwarding) that you want to export to the world at large. If there is an easy way to do this with IPTables, I can't find it (I'm not acting as a gateway to any systems so rules in the FORWARD chain don't count). A userspace util, rinetd, will accept connections on port A and send them to IP and port B.


Phishing vigilante?

Received an email today from what seems to be a phishing vigilante. Google has no results for it. Interesting. Note the link and link's target.

X-Gmail-Received: ffd7a329f43426f6e21c05d6f0a330c02fbe4c1e
Delivered-To: my
Received: by with SMTP id k13cs309860wxk;
Sun, 13 Aug 2006 16:13:39 -0700 (PDT)
Received: by with SMTP id x18mr8978354wxx;
Sun, 13 Aug 2006 16:13:39 -0700 (PDT)
Received: from ( [])
by with ESMTP id h11si3436504wxd.2006.;
Sun, 13 Aug 2006 16:13:39 -0700 (PDT)
Received-SPF: neutral ( is neither permitted nor denied by best guess record for domain of
X-ORBL: []
Received: from User ( [])
by (8.13.7 out spool5000 dk/8.13.7) with SMTP id k7CMkdAn008485;
Sat, 12 Aug 2006 15:46:40 -0700
Message-Id: <>
From: "Paypal Security Center"
Subject: Accounts Management
Date: Sat, 12 Aug 2006 18:47:10 -0700
MIME-Version: 1.0
Content-Type: text/html;
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000

Dear PayPal customer,

As part of our security measures, we regularly screen activity in the PayPal system.
We recently noticed the following issue on your account:

Our system requires further account verification.

Case ID Number: PP-132-378-817

For your protection, we have limited access to your account until additional security
measures can be completed. We apologize for any inconvenience this may cause.
To review your account and some or all of the information that PayPal used to make its
decision to limit your account access, please visit the Resolution Center.


If, after reviewing your account information, you seek further clarification regarding
your account access, please contact PayPal by visiting the Help Center and clicking
"Contact Us". We thank you for your prompt attention to this matter. Please understand
that this is a security measure intended to help protect you and your account. We apologize
for any inconvenience.


PayPal Account Review Department

Copyright 1999-2006 PayPal. All rights reserved.

This page is powered by Blogger. Isn't yours?