Wednesday, August 23, 2006
Mutating ITIL for ISVs and hardware manufacturers
In a previous post, I talked a little bit about ITIL could help an organization like Shuttle identify and resolve widespread issues that they may not find out about within the confines of traditional technical support. I want to expand on a few concepts. ITIL is a big standard. For now, I am focusing on a small subset - incident and problem tracking.
The basic concept is that every customer call is at least one incident (or case). Every incident is associated with a problem. In a theoretical model where we assume perfect knowledge, a problem would contain complete information about the error messages, other symptoms, associated defects, and resolution.
A few mutations occur to me right off the bat. First of all, at a large ISV, the support team for a major product can have up to five levels and hundreds of reps. All of these reps must be able to create and update problem IDs. The problem tracking (on top of the already standard incident tracking) cannot seriously impact the time needed to handle a call; both for economic reasons and compliance - a tech will not properly use a system that takes extra time out of their day.
So, the first hurdle is adding problem tracking to incident tracking systems. The most obvious solution would be to use a knowledge management or defect tracking system to handle this. Subsequently, only a minor modification to the incident tracking system would be needed. The time consuming factor may come into play here. How long will it take a tech to find a problem ID for a case? How long to create? Maybe most importantly, how long to update an existing problem? Finally, does everyone have rights to quickly update a problem?
I can give you a couple of pitfalls to watch out for when considering these existing systems. Knowledge management systems in a large organization are formal beasts. My company, for instance, requires every article (internal or external) to follow a three level approval process for publishing. There is a strict ownership chain which prevents others from modifying an article which you have checked out. Defect tracking systems may meet more requirements than a knowledge base. They generally allow multiple people to work on a defect. Both types of systems can be slow and rarely are they tightly integrated into the incident management system.
The requirements become more complex when you change from the perfect information model to a realistic model. When a customer calls in, it may be easy to identify an existing problem, or the problem may not be identifiable on the first call. In another scenario, maybe the product just shipped a new version and customers are calling in with crashes or a specific error message. So, the first time a customer gets an "incompatible version" error, for instance, the workflow should be something like this:
The best investment a company can make into a problem tracking system is search. This is extremely hard to get right. Technical support searches are not normal searches. The search engine must be able to handle a search for "file not found", "18345926", or "*.*", and all of these queries should be treated literally. This means minimizing noisewords and considering punctuation in the query. Don't dare introduce a search engine that uses "or" by default since you aren't guaranteed to have technicians that know when to use and, or, or exact phrase. There are easy ways to improve search efficiency in the context of a problem tracking system. Relevancy can be determined in a few ways - first, is the problem "fresh", meaning, have cases been associated with it recently? How closely does the configuration data from the current incident match the problem? Is the problem still open and not a dupe or split? If I search for incompatible version, I would want the first result to be a problem "hit" in the last week by a customer using the same product version and OS as my customer is. Each relevancy point could be indicated graphically - perhaps 1-5 stars for freshness, configuration, product version, etc.
There are a lot of benefits that can be realized if such a system is tightly bound to the incident tracking system. For instance, at the point where 25 incidents are associated with a problem, there is a lot of data we can gain about the problem just from the natural relationship. What operating systems does the problem tend to affect? Does it typically affect large or small customers? How about new or inexperienced users? Has the problem existed in more than one version of the product?
On the macro scale, organizations should mine the heck out of these problem reports. Usability teams, developers and QA should all know exactly what the top 10% of the problems are and should be thinking about how to develop and test around them. In my experience, this already happens, but in a much less formal way. Companies develop and test around less concrete metrics, such as the top knowledge base article, or most referenced defect or error message. The ISV I work for analyzes the knowledge base search queries to find patterns. These techniques are all surrogates for real problem tracking.
Remedy has been a big proponent of ITIL for a long time, and their service desk product is billed as "an incident and problem automated workflow solution". My complaint about Remedy (and ITIL, for that matter) is that it is tailored to the much larger in-house help desk market. While ISV technical support has similar needs, the needs of the in-house help desk will always win with Remedy because that is where the vast majority of their money comes from. I haven't seen how their solution handles problems. Do you have any experience with Remedy? Have a problem with this idea? Comment below and we'll talk about it.
The basic concept is that every customer call is at least one incident (or case). Every incident is associated with a problem. In a theoretical model where we assume perfect knowledge, a problem would contain complete information about the error messages, other symptoms, associated defects, and resolution.
A few mutations occur to me right off the bat. First of all, at a large ISV, the support team for a major product can have up to five levels and hundreds of reps. All of these reps must be able to create and update problem IDs. The problem tracking (on top of the already standard incident tracking) cannot seriously impact the time needed to handle a call; both for economic reasons and compliance - a tech will not properly use a system that takes extra time out of their day.
So, the first hurdle is adding problem tracking to incident tracking systems. The most obvious solution would be to use a knowledge management or defect tracking system to handle this. Subsequently, only a minor modification to the incident tracking system would be needed. The time consuming factor may come into play here. How long will it take a tech to find a problem ID for a case? How long to create? Maybe most importantly, how long to update an existing problem? Finally, does everyone have rights to quickly update a problem?
I can give you a couple of pitfalls to watch out for when considering these existing systems. Knowledge management systems in a large organization are formal beasts. My company, for instance, requires every article (internal or external) to follow a three level approval process for publishing. There is a strict ownership chain which prevents others from modifying an article which you have checked out. Defect tracking systems may meet more requirements than a knowledge base. They generally allow multiple people to work on a defect. Both types of systems can be slow and rarely are they tightly integrated into the incident management system.
The requirements become more complex when you change from the perfect information model to a realistic model. When a customer calls in, it may be easy to identify an existing problem, or the problem may not be identifiable on the first call. In another scenario, maybe the product just shipped a new version and customers are calling in with crashes or a specific error message. So, the first time a customer gets an "incompatible version" error, for instance, the workflow should be something like this:
- Tech enters in basic information - product, version, OS
- Tech initiates search for "incompatible version"
- Problem system searches within entered parameters and returns no results
- Tech enters in what he knows - at this point, just the error - and a new problem is created. In a tightly bound system, a lot of data should come from the case, such as product versions and other configuration data.
The best investment a company can make into a problem tracking system is search. This is extremely hard to get right. Technical support searches are not normal searches. The search engine must be able to handle a search for "file not found", "18345926", or "*.*", and all of these queries should be treated literally. This means minimizing noisewords and considering punctuation in the query. Don't dare introduce a search engine that uses "or" by default since you aren't guaranteed to have technicians that know when to use and, or, or exact phrase. There are easy ways to improve search efficiency in the context of a problem tracking system. Relevancy can be determined in a few ways - first, is the problem "fresh", meaning, have cases been associated with it recently? How closely does the configuration data from the current incident match the problem? Is the problem still open and not a dupe or split? If I search for incompatible version, I would want the first result to be a problem "hit" in the last week by a customer using the same product version and OS as my customer is. Each relevancy point could be indicated graphically - perhaps 1-5 stars for freshness, configuration, product version, etc.
There are a lot of benefits that can be realized if such a system is tightly bound to the incident tracking system. For instance, at the point where 25 incidents are associated with a problem, there is a lot of data we can gain about the problem just from the natural relationship. What operating systems does the problem tend to affect? Does it typically affect large or small customers? How about new or inexperienced users? Has the problem existed in more than one version of the product?
On the macro scale, organizations should mine the heck out of these problem reports. Usability teams, developers and QA should all know exactly what the top 10% of the problems are and should be thinking about how to develop and test around them. In my experience, this already happens, but in a much less formal way. Companies develop and test around less concrete metrics, such as the top knowledge base article, or most referenced defect or error message. The ISV I work for analyzes the knowledge base search queries to find patterns. These techniques are all surrogates for real problem tracking.
Remedy has been a big proponent of ITIL for a long time, and their service desk product is billed as "an incident and problem automated workflow solution". My complaint about Remedy (and ITIL, for that matter) is that it is tailored to the much larger in-house help desk market. While ISV technical support has similar needs, the needs of the in-house help desk will always win with Remedy because that is where the vast majority of their money comes from. I haven't seen how their solution handles problems. Do you have any experience with Remedy? Have a problem with this idea? Comment below and we'll talk about it.