Blind Spots in Web Application Performance Monitoring
Thursday, August 6th, 2009 Posted by: Jonathan Ginter
Contrary to popular belief, the brain is not a Personal Video Recorder, recording everything submitted by your various senses. That would be too much data for any brain to handle. Instead, it sifts through sensory input looking for relevant data points that it can trust and throws everything else away. The important words in that last sentence are “relevant” and “trust”.
If a data point is not relevant, then it is considered to be a distraction. There are well-known studies on Inattentional Blindness and Change Blindness which demonstrate that even large-scale events can be filtered out by the brain if they are considered irrelevant to the task at hand. Similarly, if the data point cannot be trusted, the brain tosses it out as well (whether your senses can be trusted has been a heated debate in philosophy for centuries, but I digress). Trust and relevance are crucial to the brain’s ability to eliminate useless noise and derive good results.
These same principles apply to monitoring your web applications. Instead of monitoring the universe, you should be reducing your data flood to those points that are relevant. Moreover, you should only be using the most trusted tools and methodologies to draw conclusions.
For web applications, the most relevant data is the data that directly describes or explains your user’s experience and places it in context. In order to identify that data, you must be able to draw a direct line from your user’s experience to those data points. If you cannot do that, you are probably chasing your tail and wasting a lot of valuable resources. It is important to realize that a lot of tools cannot draw a direct line from user experience to monitoring data without leaving a few gaps and logical leaps of faith.
As an example, operations teams love to know whether a database is down. Although this is valuable data, is it relevant? If users experienced worse performance around the same time, does that mean that fixing the database will solve the performance problem? In fact, in a well-architected environment, the loss of a web server, app server or database should have little, if any, effect on the end user’s experience due to clustering and load-balancing. A lot of solutions love to use time correlation as a magnificent leap of faith, but it simply makes unreliable conclusions look enticing.
To draw that line between user experience and environmental monitoring, you need a tool that can see the actual users’ experience and is able to relate it directly to problems in your network, application design, deployment, code quality, etc. Moreover, it must prove itself to be a trusted source of information, returning results quickly and reliably without drowning you in irrelevant data. In other words, it must be trusted to extract and analyze relevant information and return high-quality results.


