Skip to content

Coradiant

Archive for August, 2008

Using Existing Technology to Track Users Reliably


Friday, August 29th, 2008 Posted by: Jonathan Ginter

I spend a portion of my time working with Coradiant customers on session tracking strategies that ensure a complete view of the end-user experience. Session tracking gives us the ability to see clearly every action, every page and every object associated with a user. Proper session tracking achieves what we refer to as user awareness. This is different from identity awareness, which allows us to put a face to the session. I’ve written in more detail about the differences between identity awareness and user awareness, but I can recap briefly for the purpose of this article.

Identity awareness means that you know the unique ID of a specific user on the site, allowing you to put a name (or even a face) to that user’s activity – e.g., user XYZ can now be referred to as “Jonathan” or “jginter”.  Oddly enough, this is fairly simple since it only requires that users identify themselves (by providing an ID) at least once during their session – e.g., as part of a login procedure, etc.  If even one hit during the session carries the user’s personal ID, we can be configured to pull it out and place it on the session we are tracking.

I just said a mouthful and didn’t underscore it, though.  I said that we were tracking a session already, which means we would have already achieved user awareness.  To do this is no small feat, since it requires that we be able to identify the subset of hits that represent the activity of a single unique user.  This is the foundation of any good monitoring solution.  Moreover, we must find these related needles out of the haystack that is the river of traffic we are monitoring.  Ironically, this could be very simple, but most sites make it very hard.  However, we can resolve this through a couple of simple suggestions.

As I have said before, most sites host traffic that is unintentionally anonymous – i.e., there is nothing on the hit that would identify who it belonged to.  Even sites that are diligently trying to place identifiers in their traffic are likely to have anonymous traffic that they are not aware of.  Why is that?  The simple fact is that most sites suffer from tunnel vision.  They focus on their most important traffic – JSP, HTML, ASP, etc.  These hits represent their application, as they perceive it.  When they track a user’s progression through their site, these are the URLs that they care most about.  What are often overlooked are the support files – stylesheets, javascript, images, etc.  Although secondary in importance, they can be critical when it comes to understanding the performance seen by the user.  Organizations must solve this problem in order to monitor their traffic properly.

There are two basic problems that need to be addressed:

  1. Failing to use cookies to identify sessions
  2. Failing to control the domain of those cookies

Whatever identifier is being used on the primary URLs, it should also be present on all secondary URLs as well.  For those applications that use something other than a cookie to carry session IDs, this can be a challenge if not downright impossible.  Switching to cookies allows the browser – the most authoritative source for identifying a unique user – to properly label every hit using a natural aspect of the protocol.  Even when browsers do not support cookies, web servers have been designed to rewrite the URLs in the content being sent to that the cookie values are embedded in the URL paths themselves.  The beauty of this is that browsers and web servers are designed to do this with little effort on the part of the application developer.  All that is required is that the web server be configured to open a new session for any traffic it sees – something that is not usually done if the developers are trying to maintain a stateless application.

Recommendation #1: all web servers should be configured to track sessions even if the application is stateless.  Opening a session does not mean you are required to maintain any state.

So now your web servers are diligently placing a session cookie on all traffic that they are serving up, which will solve the problem … most of the time.  You can still be easily defeated by deployments in which some or all of the secondary files are hosted on an alternate server.  This is a problem because it places those files in a different domain and cookies are sensitive to domains (and paths).  It is considered a bad practice to ask browsers to send cookies to servers that don’t care about them (it bloats the network traffic and forces web servers to do extra work for nothing).  Therefore, web servers often configure their cookies so that browsers only send them back to the servers (i.e., domains) that issued them.  Thus, browsers usually do not send cookies set by the primary server to the secondary servers as well.  This is easy to fix, however, by changing the configuration of the web servers.  Most of the time, the secondary servers have the same root domain – e.g., www.coradiant.com vs. images.coradiant.ca – so a simple rule that includes both domains (*.coradiant.*) can be associated to the session ID cookie, allowing the browser to send it on all traffic to the monitored site.  That will ensure that all secondary traffic is also clearly tagged with a unique session ID.  Although this goes against best practice, the session ID cookie is typically very small and the best practice was meant to avoid needlessly sending multiple cookies and / or fat cookies with heavy payloads.  Moreover, what I am suggesting is hardly “needless”, since it solves a significant problem.

Recommendation #2: define domain patterns that cover your whole site and associate those patterns to the cookie(s) that will be carrying the session IDs.  Each web server vendor may do this differently, so consult the administration manuals for your specific web servers for more information on how this can be done.

Reconstructing sessions reliably does not have to be complicated, but people often get in their own way.  These simple recommendations should help solve a world of problems, no matter what monitoring solution you are using.