Skip to content

Coradiant

Archive for the 'Change impact management' Category

Blind Spots in Web Application Performance Monitoring


Thursday, August 6th, 2009 Posted by: Jonathan Ginter

Contrary to popular belief, the brain is not a Personal Video Recorder, recording everything submitted by your various senses.  That would be too much data for any brain to handle.  Instead, it sifts through sensory input looking for relevant data points that it can trust and throws everything else away.  The important words in that last sentence are “relevant” and “trust”.

If a data point is not relevant, then it is considered to be a distraction.  There are well-known studies on Inattentional Blindness and Change Blindness which demonstrate that even large-scale events can be filtered out by the brain if they are considered irrelevant to the task at hand.  Similarly, if the data point cannot be trusted, the brain tosses it out as well (whether your senses can be trusted has been a heated debate in philosophy for centuries, but I digress).  Trust and relevance are crucial to the brain’s ability to eliminate useless noise and derive good results.

These same principles apply to monitoring your web applications.  Instead of monitoring the universe, you should be reducing your data flood to those points that are relevant.  Moreover, you should only be using the most trusted tools and methodologies to draw conclusions.

For web applications, the most relevant data is the data that directly describes or explains your user’s experience and places it in context.  In order to identify that data, you must be able to draw a direct line from your user’s experience to those data points.  If you cannot do that, you are probably chasing your tail and wasting a lot of valuable resources. It is important to realize that a lot of tools cannot draw a direct line from user experience to monitoring data without leaving a few gaps and logical leaps of faith.

As an example, operations teams love to know whether a database is down.  Although this is valuable data, is it relevant?  If users experienced worse performance around the same time, does that mean that fixing the database will solve the performance problem?  In fact, in a well-architected environment, the loss of a web server, app server or database should have little, if any, effect on the end user’s experience due to clustering and load-balancing. A lot of solutions love to use time correlation as a magnificent leap of faith, but it simply makes unreliable conclusions look enticing.

To draw that line between user experience and environmental monitoring, you need a tool that can see the actual users’ experience and is able to relate it directly to problems in your network, application design, deployment, code quality, etc.  Moreover, it must prove itself to be a trusted source of information, returning results quickly and reliably without drowning you in irrelevant data.  In other words, it must be trusted to extract and analyze relevant information and return high-quality results.

Green code


Thursday, October 18th, 2007 Posted by: Alistair Croll

I recently wrote a blog for GigaOm’s Earth2Tech site on “Green Code.” The idea is that the quality of code matters. Two coders, writing code for the same application, can have a tremendous difference in efficiency. And that can translate to big differences in power consumption and resource costs — particularly in a virtualized or on-demand environment.

Over here on the Coradiant blog, I can speculate a bit more specifically about what this means. One of the interesting things you can do with user experience is to measure the total processing involved in a page or a user visit.

Because much of the delay on the Internet comes from network performance, two applications with significantly different host efficiency might seem as fast as one another to an end user, so you can’t really measure this just by trying two sites.

But the precision of Real User Monitoring technologies makes even millisecond differences in host processing time clear. And while web operators usually look at average (or percentile) host time, one of the more unusual ways to measure host time is to sum it. This effectively shows you the “total thinking done” for a user’s session.

This can be the start of some pretty fascinating math. Once you know host time per session, you can see how many host-seconds your infrastructure devotes to a visitor. This can show you things like whether a certain class of users is consuming more than its fair share of “heavy” searches.

(Incidentally, on the Coradiant.com site, this often reveals blog spammers from China posting comments about their various vitamins, and more questionable offerings.)

But you can also tie this host time back to IT costs.

I’m teaching a course on data center growth as part of Interop’s Data Center Summit in New York next week (more on this in a later post.) In preparing for that session, I spent a lot of time looking at the cost models behind on-demand hosting, managed servers, collocation, and global CDNs. And it made me realize there are good ways to model IT costs that vary widely according to each business.

Let’s look at combining these two metrics — host time and IT costs — to better understand the business impact of IT.
If you have a good model for IT costs (such as collocation, power, cooling, and storage) and you divide your monthly IT costs by the sum of host time for the month, you know your IT-cost-per-host-second. You don’t want to include bandwidth costs, which aren’t related to the host time.

If you then multiply host-seconds for each user session by that IT cost, you can calculate how much each user session costs you.

This is an excellent basis for evaluating change across releases. It will reflect increased costs in hosting (such as the introduction an application accelerator,) reductions in delay (such as a drop in host time from the AFE’s application acceleration functions reducing the load on servers,) and even changes in pages per session.

You can actually report average IT cost per user session.

As a result, you’ll now know the actual impact of that deployment: Did the reduction in IT-cost-per-host-second outweigh the investment in the AFE? How many weeks did it take to pay the cost back? Is the additional site navigation costing us more?

Of course, there are many other benefits to reducing host time, from user satisfaction to increased capacity to reduced SLA refunds. But this idea of IT-cost-per-host-second is a nice, concrete way to think about what code changes or other modifications to your operations do to your business.

Now back to the fascinating sessions at Web 2.0.

Where should I use User Experience?


Tuesday, April 3rd, 2007 Posted by: Alistair Croll

User experience has many applications. We’ve seen people adopt it pretty aggressively for incident management and service level management. But we’re also working with customers and third-party partners on a number of other applications.

User performance data joins test-based and device-based monitoring as the three fundamental building blocks of web performance management. And just as testing is used everywhere from capacity planning to reachability monitoring to penetration testing, so real user monitoring is finding a wide range of applications.

One of the reasons for this is its relevance to groups outside of IT. Business information such as the value of a transaction or the name of a subscriber are a part of the data that’s collected, so it’s much more than just performance information. It’s a real-time feed of user activity that gives the business insight into its online interactions.

I put together the circle diagram below to illustrate some of the ways that user experience is being employed.

The User Experience Management circle

Starting with the fundamentals — good, accurate, detailed per-hit and aggregate data collected from not only web pages but also Rich Internet Applications — user experience applies to all of these areas:

  • User Analytics, in concert with a web analytics tool to look at conversion and search engine sources. For some web applications, user experience is the only way to collect transaction information since the site isn’t publicly deployed.
  • QA and testing, both at the start of the test cycle (recording a user session for later use in a load-testing application) and at the end (watching code as it goes into production to see if QA missed any issues.)
  • Helpdesk, for problem diagnosis and user assistance.
  • Billing, for generating usage reports by subscriber or customer and assessing bills for excessive use.
  • Dispute resolution, using facts instead of anecdotes to see what really happened and resolve an issue fairly.
  • Incident management, in which problems are detected as soon as a user experiences them — before the phone rings — and resolved using the forensic data that was recorded from the web session.
  • Service Level Management, generating performance and availability reports by customer, geography, or branch office.
  • Baselining, watching a particular function, server, or site to get an idea of what “normal” is in order to set thresholds or measure long-term growth.
  • Capacity planning, in which the relationship between traffic (load) and latency (performance) is calculated over time to see how much a site can handle before becoming unacceptably slow.
  • Compliance, keeping a record of transactions for long periods of time in order to comply with industry law or regulations or to protect the company from risk.
  • Fraud detection, in which user traffic is analyzed to look for patterns of anomalies or inappropriate use — from hack attempts to site harvesting to sharing of account logins.

Our customers are building many of these themselves, using third-party and open-source tools alongside our equipment. We’re also partnering with a number of companies to test and document proven integrations. Our new VP of Business Development, Ali Hedayati, has his hands full with all of these relationships and others.

Whatever the final result, there’s no doubt that user experience is a ripe field for innovation, and that it’s transforming many parts of an organization far beyond simple incident detection.

Change Impact Management


Sunday, November 12th, 2006 Posted by: Alistair Croll

One of the hot topics for many of our customers is using Real User Monitoring to measure the impact of a change. There are plenty of changes that sites experience every day, from new content or new code to modifications to routing or end-user browsers. (Hey look! All your customers just switched to Firefox beta 2 and IE 7!) And the consequences of those changes can be far-reaching, from slower pages or more downtime to harder-to-identify problems like reduced capacity or dissatisfied users.

At our customer event this week I’ll be presenting a more detailed look at how to use our flagship product, TrueSight, to perform change impact analysis. It’s really a collaboration between the agents of change (engineering) and those who have to deal with the change (operations.) And it usually involves a change planning board of some kind.

Change management schema

Here’s a high-level workflow of these three groups, working (hopefully) in concert.

  • It starts with the definition of what’s going to change
  • Following that, the operations team uses TrueSight to measure the current state and the engineering team monitors performance during their test cycles. This helps to identify disconnects between how the current application performs “in the wild” and how the anticipated change will work.
  • Once the change completes QA, the same Watchpoint is used to measure the performance after the change.
  • The organization can decide whether or not to quickly roll back from the change; they can also generate a change impact report to show what was gained (or lost) by the change.It’s an interesting topic, and something that’s largely ignored by web operations teams at their own peril. The discussions at CUG2006 should be fascinating.
  • Change impact management


    Tuesday, August 29th, 2006 Posted by: Alistair Croll

    One of the best things about this job is the number of people I get to meet and talk with. It lets me see the patterns that form across the industry in fascinating and sometimes unpredictable ways.

    One of the big shifts I’m seeing a lot of lately is a move away from simple error detection towards change impact management. It’s been said that the only constant is change, and this is certainly true in web applications. One of our customers rolls out six or eight changes a day—and these are code changes!

    Early on, people thought Real User Monitoring would be a great way to detect problems and get to work on them before the phone rang. And it is. But an increasing number of people are using our technologies to measure the before and after of a change.

    The change may be as small as a memory upgrade or a new layout, or as big as a data center move or switch from Java to .net. But in every case, they want to know two things:

    • Did the change I made do what it was supposed to?
    • Did I inadvertently break something when I changed that?

    The two questions seem almost the same. But they’re different in subtle ways. An engineer might make a change to address a problem (like poor performance.) Or they might try to add a new feature or function. In the former case, they have an intended outcome—reduced latency—that they want to verify. In the latter, there’s not supposed to be an impact.

    One of the biggest impacts is knowing whether a new version can handle the same traffic as its predecessor. Our customers do this by isolating the specific function (using a technology called a Watchpoint) and then plotting the relationship between performance and load. They then make the change, and compare before and after.

    This lets them say things like, “before, 95 per cent of users got the report in under 5 seconds when we had 40 hits a second; now, that’s only true at 25 hits a second. That change cost me 15 hits a second, you bonehead!” To be fair, that’s usually what the product manager says to the coder; and more often than not, we see improvements from release to release. But you get the idea.

    The other thing that’s interesting about change impact management is the number of people who care. Operations teams don’t like change—in fact, one of our support team has a big sign that says, “what changed?” taped above his desk because change is nearly always the root cause of a problem. On the other hand, engineers live for change. They get paid for it. Whether they’re altering a network, or modifying a piece of code, or rewriting a query, they’re always changing something.

    More and more of the people using Coradiant’s products are engineers. And that signals an expansion of the role of Real User Monitoring within IT, as everyone starts to realize how they can benefit from measuring actual users.