Skip to content

Coradiant

Blog

Handling the Truth

December 24th, 2008 Posted by: Jonathan Ginter

Coradiant’s TrueSight End-User Experience Management product evokes a number of interesting reactions when we first start monitoring a customer’s Web traffic.  One of the most common reactions is amazement at just how many bugs exist – even in the best Web applications.  One customer speaking at a luncheon described the experience as being similar to turning on the light in your apartment and seeing big ugly cockroaches everywhere – you are appalled, you are embarrassed … and you feel a strong urge to simply turn off the light.

This might seem like a damning statement to make about one’s own environment.  And yet we repeatedly hear how such confrontations with the ugly truth have provided insights that resulted in the correction of long-standing problems, some of which had never even made it onto the radar of Web Operations.  I think you would have to struggle to find a Coradiant customer that did not have a similar story to tell. One customer discovered that 30% of their traffic consisted of cache hits (where the server reports that nothing has changed) or redirects.  By simply tweaking the caching parameters returned by their web servers, they reduced the load on their servers significantly.  Another customer discovered that some pages were taking up to 1.5 minutes to be handled by the server before a response was being sent back to the browser.  Yikes.

How many users are hitting your site?  How many errors are being returned?  How slow are the pages?  How reliable is the network?  Some customers are clearly floundering without any real ability to answer these fundamental questions.  Other customers believe they already have a solid handle on such issues.  We have found that almost all of them have a real shock in store.  Some of our most loyal customers are those that firmly believed they knew the truth already.

Often, though, the insight doesn’t have to be that deep to be a revelation.  It never ceases to amaze me how often web sites are thrown over the fence to be supported by a team that hasn’t the first clue about what they have taken on.  We offer a fairly simple feature that reports lists of traffic attributes sorted by popularity – e.g., URLs, hosts, client IP address blocks, geographic regions, cookie keys, etc.  Our customers can define their own fields as well, pulling whatever they would like out of the traffic to do so (e.g., database error codes, product IDs, etc).  We use this feature to help populate configuration fields.  However, the contents of those lists proved to be such a revelation to our customers that we re-categorized the feature under “Reports”.  Using this simple feature, one of our customers noticed that we were seeing internal traffic that we should not have been able to see.  This led him to realize that his routers were improperly configured.

The customer that we had invited to speak at our luncheon finished off his presentation by advising others – somewhat jokingly – to consider carefully whether they were truly ready to handle the truth about their traffic.  Ugly as it may be, facing it can reveal real solutions to real problems.  I highly recommend it.

The Benefits of Immediate Data

November 21st, 2008 Posted by: Jonathan Ginter

As the world moves on-line for most of its social and business interactions, it becomes more and more important for us to be able to react quickly when the systems that support those interactions exhibit problematic behavior.  Since problematic behavior is not always reflected by the health of your infrastructure, this has to be measured from the end user’s perspective.  In other words, if the end user’s experience degrades in any way, the application has become problematic.

This can present quite a problem on several fronts:

  • Measuring the end user’s experience
  • Being notified quickly that the user’s experience has degraded
  • Discovering that a potential fix has failed to address the problem so that it can be rolled back before too many users are negatively affected

As an example, let’s imagine that the on-line store on one of our web servers suddenly experiences an internal problem, which causes its performance to tank with no outward sign of distress (i.e., no log entries, etc).  Traditional methods of detection and notification will not work for us here.

Moreover, time is of the essence.  In our various field deployments, I have noticed that having 5000 users on your site every hour – on average – is quite common.  In fact, some sites have been known to average about 100k users every hour.  So if it takes you an hour to even notice that you have a problem, you have already upset quite a few users.  You need to be able to react quickly.

Challenge #1

How do we detect the problem?  We need to monitor the end user’s experience and it needs to happen in real-time.  This challenge has become less of a problem as various End User Experience Management (EUEM) tools have emerged to address this, some more successfully than others and each with its own unique feature set.  However, this is not the immediate focus of this article.  So, let’s assume that we already have such a tool in place.

Challenge #2

How quickly can we be notified that the user’s experience has gone south?  That depends upon the immediacy of the data and that depends largely on the tool that’s been chosen.

When an application starts to collapse, there are typically two major symptoms:

  • A drop in performance
  • A drop in volume as users abandon the application

If we typically receive 5000 users per hour in our on-line store, we can assume that 80 or more users are negatively impacted every minute.  Moreover, so far we’re only talking about being notified.  Once that happens, we will still have to analyze and deal with the problem.  All the while, the problem on the site is spreading to more and more users.

Assuming that the problem may only be noticeable as a trend, waiting for several minutes for enough data to be gathered to predict the trend might be necessary.  However, the lag time should be kept to that order of magnitude.  Waiting for an hour or more to be notified should be completely unacceptable.

Moreover, if the problem can be detected from a single hit on the site – e.g., the application is throwing back pages with error codes embedded in them – then the notification should be almost immediate.  The lag time from seeing a hit on the wire to the time that an alert can be sent about that hit should be within a few minutes, at most.

Challenge #3

Immediacy of data is also a concern when a fix is being rolled out and we need to validate that the problem has truly been addressed.  Rolling out a fix and waiting for an hour to gather the results is unacceptable in this day and age.  The only organization that should be willing to accept such a lag time is NASA (and at least they have a good reason for it).  If potential fixes cannot be validated within minutes, then users are being treated like piñatas.  Generally speaking, users don’t appreciate that.

You own that data.  You deserve to have access to it as fast as possible.  Your users will thank you.

Using Existing Technology to Track Users Reliably

August 29th, 2008 Posted by: Jonathan Ginter

I spend a portion of my time working with Coradiant customers on session tracking strategies that ensure a complete view of the end-user experience. Session tracking gives us the ability to see clearly every action, every page and every object associated with a user. Proper session tracking achieves what we refer to as user awareness. This is different from identity awareness, which allows us to put a face to the session. I’ve written in more detail about the differences between identity awareness and user awareness, but I can recap briefly for the purpose of this article.

Identity awareness means that you know the unique ID of a specific user on the site, allowing you to put a name (or even a face) to that user’s activity – e.g., user XYZ can now be referred to as “Jonathan” or “jginter”.  Oddly enough, this is fairly simple since it only requires that users identify themselves (by providing an ID) at least once during their session – e.g., as part of a login procedure, etc.  If even one hit during the session carries the user’s personal ID, we can be configured to pull it out and place it on the session we are tracking.

I just said a mouthful and didn’t underscore it, though.  I said that we were tracking a session already, which means we would have already achieved user awareness.  To do this is no small feat, since it requires that we be able to identify the subset of hits that represent the activity of a single unique user.  This is the foundation of any good monitoring solution.  Moreover, we must find these related needles out of the haystack that is the river of traffic we are monitoring.  Ironically, this could be very simple, but most sites make it very hard.  However, we can resolve this through a couple of simple suggestions.

As I have said before, most sites host traffic that is unintentionally anonymous – i.e., there is nothing on the hit that would identify who it belonged to.  Even sites that are diligently trying to place identifiers in their traffic are likely to have anonymous traffic that they are not aware of.  Why is that?  The simple fact is that most sites suffer from tunnel vision.  They focus on their most important traffic – JSP, HTML, ASP, etc.  These hits represent their application, as they perceive it.  When they track a user’s progression through their site, these are the URLs that they care most about.  What are often overlooked are the support files – stylesheets, javascript, images, etc.  Although secondary in importance, they can be critical when it comes to understanding the performance seen by the user.  Organizations must solve this problem in order to monitor their traffic properly.

There are two basic problems that need to be addressed:

  1. Failing to use cookies to identify sessions
  2. Failing to control the domain of those cookies

Whatever identifier is being used on the primary URLs, it should also be present on all secondary URLs as well.  For those applications that use something other than a cookie to carry session IDs, this can be a challenge if not downright impossible.  Switching to cookies allows the browser – the most authoritative source for identifying a unique user – to properly label every hit using a natural aspect of the protocol.  Even when browsers do not support cookies, web servers have been designed to rewrite the URLs in the content being sent to that the cookie values are embedded in the URL paths themselves.  The beauty of this is that browsers and web servers are designed to do this with little effort on the part of the application developer.  All that is required is that the web server be configured to open a new session for any traffic it sees – something that is not usually done if the developers are trying to maintain a stateless application.

Recommendation #1: all web servers should be configured to track sessions even if the application is stateless.  Opening a session does not mean you are required to maintain any state.

So now your web servers are diligently placing a session cookie on all traffic that they are serving up, which will solve the problem … most of the time.  You can still be easily defeated by deployments in which some or all of the secondary files are hosted on an alternate server.  This is a problem because it places those files in a different domain and cookies are sensitive to domains (and paths).  It is considered a bad practice to ask browsers to send cookies to servers that don’t care about them (it bloats the network traffic and forces web servers to do extra work for nothing).  Therefore, web servers often configure their cookies so that browsers only send them back to the servers (i.e., domains) that issued them.  Thus, browsers usually do not send cookies set by the primary server to the secondary servers as well.  This is easy to fix, however, by changing the configuration of the web servers.  Most of the time, the secondary servers have the same root domain – e.g., www.coradiant.com vs. images.coradiant.ca – so a simple rule that includes both domains (*.coradiant.*) can be associated to the session ID cookie, allowing the browser to send it on all traffic to the monitored site.  That will ensure that all secondary traffic is also clearly tagged with a unique session ID.  Although this goes against best practice, the session ID cookie is typically very small and the best practice was meant to avoid needlessly sending multiple cookies and / or fat cookies with heavy payloads.  Moreover, what I am suggesting is hardly “needless”, since it solves a significant problem.

Recommendation #2: define domain patterns that cover your whole site and associate those patterns to the cookie(s) that will be carrying the session IDs.  Each web server vendor may do this differently, so consult the administration manuals for your specific web servers for more information on how this can be done.

Reconstructing sessions reliably does not have to be complicated, but people often get in their own way.  These simple recommendations should help solve a world of problems, no matter what monitoring solution you are using.

Analyzing the End-to-End Challenge

June 20th, 2008 Posted by: Jonathan Ginter

Julie Craig from Enterprise Management Associates published a very interesting article entitled “The End-to-End Challenge“. In this article, she reveals some disturbing statistics, among which were the following (I am paraphrasing here):

  • - 43% of application outages are still reported by users
  • - 37% of IT professionals lack the tools they need to support their business applications (even though unrelated research reports that IT organizations are using anywhere from 5 to over 25 management tools)
  • - 41% of IT organizations prefer to use “expert opinion” to diagnose problems

 

Although I believe the rate of user-reported issues is much higher, I note that she used the term “outages”, so it is possible that she is only referring to actual downtime and not slow performance or other types of errors. If this is, in fact, a correct interpretation of her meaning, it makes her estimate even more ominous for IT organizations. If Ms. Craig is correct, then an area where IT departments considered themselves to be fairly proficient – the detection of downtime – is proving to be more flawed than previously believed.

However, what caught my eye most were the subsequent assertions. More than a third of IT professionals feel that they are poorly equipped to monitor their web applications even though they are – for the most part – drowning in tools. Ms. Craig goes on to point out that almost half of the IT departments surveyed were relying on their resident experts to figure out what was wrong. I can’t help but feel that this is a direct result of losing faith in the wealth of available tools. When the tools are not doing the job, it is a natural reaction to fall back on the human factor.

So why are all of these tools failing to do the job? Ms. Craig clearly believes that the problem is with end-to-end visibility. However, I disagree for a very simple reason: this fails to address the rate of user-reported outages. Users cannot see the full end-to-end and yet they are more efficient at noticing problems than the IT department. If you want to be as good as your users, you have to be able to see how they are being affected by your applications. You need to see your users’ experience.

And that is what is wrong with most tools out there today. They look at the infrastructure instead of the users. If you can’t see the negative impacts on your users, then all of your other monitoring is rather pointless, since it doesn’t help to support the bottom line of making your users happy.

And let’s be clear. You want to see what is happening to all of your users, not just one or a handful. You have to focus on the forest and not the trees.

It’s nice to see the end-to-end picture, but that is only useful after you have won the war of finding more problems than your users do.

 

 

 

 

75% of Today’s Online Recruiting Leaders Use Coradiant TrueSight™ Products for End-User Experience Management

June 19th, 2008 Posted by: Tony Tissot

Gartner released their June 2008 version of their venerable “Magic Quadrant” ranking for E-Recruitment Software.

Six of the eight leaders are already Coradiant customers.

This is a clear indication that leaders in this burgeoning field are taking End-User Experience Management seriously.

Coradiant is consistently chosen by leaders in online HR and in a number of other industries whose business relies on web applications.

With Coradiant TrueSight, businesses know how they are treating every single one of their web visitors, whether they are a small business interfacing with a few high valued users or a large enterprise interacting with hundreds of individuals online every second.

Leaders in E-recruitment software and HR Software-as-a-Service businesses are all focused on providing a high-quality end user experience. And Coradiant is the overwhelming choice among the leaders for End User Experience Management. Gartner ranked leading vendors on the completeness of their model and on their ability to execute.

Copies of Gartner’s E-Recruitment Software Magic Quadrant report are available from Gartner, Inc. (www.gartner.com).   

 

 

 

 

User Recognition in the Evolving Web

June 13th, 2008 Posted by: Jonathan Ginter

The holy grail of web monitoring – whether for real user experience or web analytics or any other purpose – is to be able to reliably recognize users.  You can only accomplish this goal by inspecting the traffic itself.   However, as I pointed out in a previous posting, you will always be as blind as your own applications.

Naturally, applications will only inject such identifiers when they are interested in identifying the user in some fashion.  Not all of them are.  Moreover, due to the undisciplined nature of web development, some of them are horrendously inconsistent in their intentions.There are really three levels of user awareness:

  • Identity: an application is identity-aware if they require the user to authenticate themselves in some fashion.  This is typical of on-line banking, insurance sites, etc.
  • User: an application is user-aware if they track the user’s session but cannot actually identify the user.  For example, most on-line stores allow anonymous users to buy items from a catalog, requiring a session state so that the site can remember the contents of the shopping cart.
  • Anonymous: these applications do not track the user specifically and they do not place anything in the unique in the traffic.

Most sites that are interested in tracking users believe that they are either identity-aware or user-aware.  In fact this is frequently not the case.  Some applications appear to be user-aware because they use cookies to remember preferences or state.  However, none of those cookies are unique to a given user, so those applications are actually entirely anonymous.  Most other sites are at least partially anonymous, failing to place anything unique in the traffic unless absolutely required.  Thus, users are allowed to remain anonymous through large sections of those sites, only becoming traceable when they enter a transaction or log into the application.

Anonymous users cannot be tracked.  The HTTP protocol does not do it natively and the evolution of the internet is only making that more apparent.  If you are allowing traffic to be anonymous, then you must either accept that you cannot track that traffic reliably (you can make reasonable attempts using things like the client IP, but it will be seriously flawed) or you must alter your applications to build in the level of awareness that you need.

Is User Identification Hopelessly Broken?

June 4th, 2008 Posted by: Jonathan Ginter

The Web Analytics industry is in the midst of a debate about how to identify and count Unique Users.  Some people are starting to suggest that we should abandon the idea of Unique Users in favor of counting something easier.  At the heart of that debate is the question of whether we will ever be able to uniquely identify users on the web.
 

Surely I can trust the client IP?

The problems with the client IP have been public knowledge for a long time.  This is an excerpt from a tutorial about Web Analytics, published by Summary.net (a log analysis tool) back in 2002:
“The majority of Internet users connect through dial-up services of some kind. In order to preserve IP numbers (there are a limited number available right now), the dial-up providers will assign each user a number when he connects and then reuse the number when he is done with it. So a dial-up service may have 100 IP numbers that they select from and use to serve 2000 users. This gets even more complicated with caches and proxies that many providers now use to improve performance …”
With the introduction of mega-proxies (like AOL), this problem gets even worse.  Mega-proxies will spray their traffic across multiple gateways.  Since the internet was designed to treat each hit as a stand-alone transaction, this means that every request making up a single page can be routed through a different client IP and port.  So, instead of a one-to-many relationship between the IP and the users, we have a many-to-many relationship.
Entities like corporate firewalls are rendering the client IP extremely weak and unreliable as a user identifier.  Mega-proxies completely destroy its reliability.
 

What about the user agent?

A lot of people choose to set aside the concern about mega-proxies and talk about combining the user agent with the client IP as a differentiator.  The problem with this is that there are a finite number of user agents in the world.  Admittedly, they number in the thousands.  However, these user agents are shared by millions of web users, which means that tons of users are being represented by the same user agents.  In fact, this understates the problem since most people are running the same browsers and plugins on the same basic operating systems, reducing the pool of popular user agents.  Combining IP and user agent will still result in users that are sharing the same combination.
If you are expecting to use this as a means for identity tracking – as in “this is Bob” – then you are going to be disappointed.  Since the user agent contains information about the browser and the OS, it can easily mutate over time as users upgrade their browser, download plugins, install service packs, etc.  Moreover, users are not limited to one browser – I use Firefox but am occasionally forced to use IE on specific sites – or one system.  I surf from my laptop, my wife’s computer and my iPod, so I’m using three different platforms as well as three different browsers.
 

Enter the plugin

At this point, you may be thinking that the user agent will at least improve your odds.  This would be true if it weren’t for plugins.  Plugins within a browser are allowed to request their own resources from the server.  When they do so, they send a user agent and it does not have to be the same one used by the browser.  The Java plugin is a classic example.
 

Grab your bootstraps and pull

This problem – as with all others – begins at home.  If you want to track users, do not expect the HTTP protocol to help you.  It was originally designed for anonymous traffic.  Deploy your own tracking IDs that are tailored to your needs.  Most web servers have mastered the art of injecting user awareness into the traffic (via cookies or URL-rewriting).  If you need identity awareness, then you need to take the next step and have your developers build that into your application.
There is no magic bullet.  You need to solve this problem for yourself.

The times, they are a-changing …

May 28th, 2008 Posted by: Jonathan Ginter

I’ve been thinking a lot about a company that we met with recently. Their web site is one of many media that they use to promote their products. Each medium lends its particular talents to the promotion of the products. They drive their traffic across these multiple channels as a means of increasing sales.

Given the immediacy of the web and their other channels, their promotion campaigns are incredibly brief – typically on the order of a few hours. Moreover, they often run more than one campaign per day. They have engaged their customer base extremely well and are using the immediacy of their channels to drive revenue sky-high.

However, each campaign requires new content for their web site. Moreover, that content must be removed when the campaign is over. Think about that for a minute. They are altering the content of their site several times a day, every day. It’s almost an hourly release cycle. As an IT department, what would that rate of change do to you?
Oh look, here comes the tide …

 

And this trend is not just limited to web content. The release cycle is shrinking on all fronts – infrastructure changes, application updates, etc. As businesses try to tap into the immediacy of the internet, they are going to expect their IT department to be equally nimble, moving swiftly to add servers for increased capacity, deploy new content to support campaigns or apply software upgrades to resolve issues. Where this type of activity used be allotted several months to ensure quality before deployment, we are now seeing that reduced down to weeks or even – in the case of this one company – down to a matter of hours.

In the IT world, change = instability and instability leads to support problems and customer calls. As I’ve mentioned before (see Do you know who your users are?), IT departments will typically only catch 3% of the problems before their customers are affected by them. That’s not a great track record. If the rate of change continues to increase, so will the number of problems until the IT department is in danger of drowning completely. Many IT departments we have met with are already in serious trouble. They can’t afford to have things get any worse. What are they expected to do?
 

Perpetual beta

 

Welcome to the world of the “perpetual beta” where you can leave your obligations at the door.

The need to accommodate an increased rate of change was the main reason that the term “perpetual beta” was coined. It effectively announced to the world that “you should expect this product to have problems, so don’t get too upset or hassle us about it”. Techies love this term because it absolves us of our obligation to provide good service and reliable products, allowing us to focus on being “innovative” instead – as though the two concepts were incompatible. In my opinion, the need for such a term is actually a deplorable testament to our inability to find and fix problems when it comes to web applications.

The term is already being applied to public web sites (with Google leading the charge). However, we are increasingly at risk of applying this term to corporate web sites. Imagine if the site that handled your on-line banking were a “perpetual beta” site. “Whoops, we just lost $3000 of your hard-earned cash. We’re so sorry, but this is a beta. We’ll get right on that, assuming it does not stop us from rolling out our next great feature.”

Not exactly awe-inspiring, is it?
 

And it’s slow, too …

 

The other main issue that is not discussed very often is performance. Even if you are diligent about finding and resolving crashes and obvious flaws, constant change can also affect performance. And performance is the hardest aspect to monitor effectively. It is usually the last thing that is tested by a QA department. In an accelerated release cycle, it is typically the first activity to be reduced or entirely cut from a schedule.

We have an existing customer whose IT department does not have authority over the deployment of new content, although they are expected to support it once it is rolled out (sadly, this is a fairly typical arrangement). On one particular day, the IT department started getting an enormous number of complaints about the performance of the site. After consulting TrueSight, they noticed that the content had changed. So they called the marketing department. It turns out that the marketing department had rolled out a “new look” for the site that included a lot of high-resolution graphics. Suddenly, delivering the content of the site to the user was like shoving an elephant through a garden hose. The IT department, of course, had never been informed of this since “nothing important was changed, so it shouldn’t make a difference”. Well, that’s comforting. At least we can warm ourselves in that glow while the phones are ringing off the hook.

If you have signed Service Level Agreements with your customers, I’m looking at you. Rather pointedly. You can also include yourself if poor performance tends to cause an increase in support calls (although the SLA victims are worse off, believe me).

All of this to say that any change – any at all – can cause significant problems and cost you real money.
 

Slaying the beta beast

For obvious reasons, corporations cannot – and should not – condone “perpetual beta” status on their web presence. If you cannot declare “perpetual beta” status, what can you do in the face of such a rapid rate of change? How do you take your life back?

You cannot control the QA cycle, so you must work with the hand you are dealt. Given the trend in release cycles, it is increasingly likely – in spite of the best intentions – that you will be expected to support releases that have less and less quality. The onus will be increasingly on you to sniff out and address poor quality before anyone is affected.

To accomplish this, you need the following abilities:

  • - To visualize what is happening on your site as it happens, in real time
  • - To monitor changes in error rates, traffic levels and performance
  • - To provide proof of culpability to other teams and departments

 

With these basic tools, you can take a less-than-tasty rollout, find its primary flaws and schedule fixes before anyone picks up the phone.

Driving quality

Several of our customers have started to use TrueSight’s abilities in these areas to drive greater quality into the development process. One customer used TrueSight’s alerting capabilities to send an email to the entire development team whenever a customer clicked on a broken link. You can imagine that backlash that ensued at first, as developers demanded that the spamming cease and desist. The IT department stuck to its guns with the simple point that the developers could stop the email themselves by fixing the broken links. And guess what? Miracle of miracles, the links were fixed! More importantly, after the initial storm blew over, the developers were intrigued with the possibility of having more direct feedback from production. They now work more closely with that IT department towards producing better quality.

Simple exposure of flaws – with hard evidence to support those assertions – can be a powerful tool in changing the dynamics of departmental relationships and molding corporate policy when it comes to quality.

And we desperately need that type of change in this industry.

Do you know who your users are?

April 20th, 2008 Posted by: Jonathan Ginter

 

When the web first appeared on the IT scene, it seemed like a wonderful solution to the main problem facing IT at the time – how to push software upgrades onto hundreds (or thousands) of desktops.  It reduced the supported infrastructure down to a single set of web servers, etc.  Unfortunately, as applications were migrated to the web, IT departments lost their relationship with the users and help desks were put in place to take over that role.  Some IT departments may even have considered this to be a benefit.  At first.  They completely failed to see the down side before the storm hit.

Who are you and what are you doing to my web site?

It is important to note that up until this point, the IT department enjoyed an intimate relationship with their users.  They knew everyone that was using the application and could call them up directly, if necessary.  More importantly, the users knew the IT staff and could speak to them directly, if they wished.  The web took away that vital relationship.

Web users are a faceless mob – completely anonymous whenever they choose to be.  Even if they are registered users, you may be forced to consider them as “anonymous” due to privacy laws (even if they work for your own company).  Even worse, you can’t speak to them directly.  Gone are the days when the IT technician could call up a user and say, “I’m having trouble reproducing your problem.  Can you give me some more details?”  Moreover, web users do not have your phone number either.  They do not even know that you exist.  You might never be able to answer the most important IT questions:

- Who are you?
- What is your environment and how is it different?
- What were you doing when this problem occurred?

If you can’t get the answers to those questions, you have severely limited your ability to solve problems when they arise.  Your job has just moved several notches up the “difficulty” meter.

The only people that web users might interact with are Customer Support agents.  Help desks have taken over the IT department’s traditional relationship with the end user.  That means that all of the details that the IT department needs in order to do their job must be gathered by the help desk.  In fact, all communication with the end user is filtered through them.  A survey done by HDI in 2007 indicated that 70% of all known issues are reported via the help desk.  In other words, of the known issues, a small minority - less than one third – is being found by the IT staff and the majority is being reported by users after they’ve had a chance to get upset.  Moreover, since many CS departments are staffed by non-technical people in a high-turnover environment, the likelihood of your vital debugging information being the victim of a broken telephone effect is very high.  And don’t forget to factor in the impact of having an out-sourced help desk that might not even be clear on what your application is all about.

The loss of the relationship with the user is one whose negative impact cannot be adequately measured.

If you only knew what you don’t know …

And it gets worse.  A report released by Transversal last December indicates that the average corporate web site in the UK can take anywhere from 30 hours to 116 hours to respond to support requests via email.  This is driving customers to switch to phone support.  This, in turn, discourages users from reporting problems, since the phone is a more time-intensive method.  Since most IT departments – whether they realize it or not – are largely relying on users to find the problems, this is a serious issue.  How many problems are going unreported?

Several years ago, I was working as a consultant for a large telecommunications company with a base of about 11 million users.  Needless to say, they were concerned with the welfare of such a large customer base.  They relied on an internally produced survey indicating that users only reported 10% of the problems they experienced.  Of the ones that reported problems, a minority consisted of extremely diligent users who felt it was their duty to report issues whereas the majority tended to be extremely disgruntled users who were so angry that they insisted on taking the time to be heard.  This implies that everyone else felt that it wasn’t worth the time or effort to report a web glitch and either tried their transaction later or gave up entirely.

If we take that survey seriously, then the real pool of issues is an order of magnitude larger than you believe it is – e.g., if you know of 100 issues, then there are another 900 that you might never find out about.  This also means that the IT department’s internal visibility into the health of their applications is an order of magnitude smaller than they believe – i.e., only 3% of actual issues are being found by the IT department, with another 7% being reported by the help desk.  This gigantic blind spot should be an unacceptable liability to any IT department.

If, in the absence of real contact with your users, you are relying on your QA lab and your help desk to find and address your web problems, I wish you much luck.  Your work is cut out for you.  Instead, you need to start looking at ways of monitoring what is happening to your users in real time.  I would strongly recommend that you start thinking about adding Real User Performance Monitoring to your tool box.  Don’t wait for them to tell you that something’s wrong.  Get ahead of that tsunami.  There are excellent tools out there to help you with this problem. 

What makes a “must-have” IT product?

February 26th, 2008 Posted by: Tony Tissot

Patrick Gardella, of Discovery Communications, recently spoke to Network World and said about Coradiant TrueSight, “Basically, it allows us to identify very rapidly what is happening with actual users on the site, and then it helps us debug those things.”

“With our huge online shopping site — and other Web systems that require major user interaction – users have problems. When that happens, we get e-mails saying, ‘Your site is broken’ and not much more. The reason I like Coradiant is that it offers a very simple, easy-to-use appliance that can find out what’s happening with those individual users, as well as how many other people are having those same problems.”

For the full article see: Network World