Classic GI=GO Equation Holds True for Web Analytics
Aug 20, 2010 Statistics, Web Analytics
Garbage In = Garbage Out. People who spend their working hours analyzing numbers generally come to this realization. It is true for modeling, it is true for forecasting, and it is completely true when it comes to website analytics.
The chain of events looks something like this:
1. Someone visits a website integrated with a web analytics platform like Google Analytics, Webtrends or Omniture.
2. A web page visitor either navigates to a tracked page or performs a tracked action.
3. A script is executed in the browser, sending data to the analytics platform.
4. The data is added to the datastore.
5. The data is summarized and analyzed.
Problems arise when you assume that steps 1 through 4 are happening correctly, and you move right on to looking at reports and data that come out of the process. Oddly, most site developers I have met who are instrumenting a site for web analytics consider their job done and successful if tags fire when they are expected to. They don’t look at the data as it is passed with the tags and they don’t look and see what made it into the web analytics platform’s datastore. Anything you don’t check in software development is frequently going to be wrong. If your data is wrong, then all your analysis of it will be just as wrong as the data. Again, there’s the classic equation describing this relationship:
Garbage In = Garbage Out
How do you prevent your data from being garbage? QA and debug the data, that’s how.
Before you use information coming from a web analytics solution, you should (or someone should) do these two tests:
1. Web Analytics Data Test Number One: Is the data being passed correctly?
Use a header tool of some kind to see what tags are being invoked and what kind of data they are passing to the web analytics platform. I use WASP. It shows you what kind of tag is fired when you click on navigation and site functions, and then it lists the data values the tag passes. The test is this:
Step 1: Navigate to every page in the site. A pageview should be generated for every pageview you generate and it should have the correct page name passed with it. Implementation of this is usually OK for standard HTML sites, but is error-prone for Flash sites.
Step 2: Click every function you are tracking as an action or event. See that an action or event is generated for each one you click, and that it is firing a tag that classifies it correctly – as an event, not a page view, and that the name and category that are assigned to the action are what they should be.
(Steps 3-n): Anything else you have tagged for measurement, like ad placements for an ad server, should also be clicked systematically to see that everything that is supposed to be captured about ad impressions is actually captured and passed when the tag is fired.
2. Web Analytics Data Test Number Two: Is the data making it into the database(s) correctly?
Set up your full site in staging so it will have a recognizable hostname that you can filter by in your reporting tool. Tell everyone else not to play with the version in staging for a while.
Step 1: Navigate to every page in the site in a systematic order. Do this several times. Make sure you keep track of how many times each page is viewed.
Step 2: Click every function you are tracking as an action or event. Do this several times. Make sure you keep track of how many times each action is done.
(Steps 3a-3n): Anything else you have tagged for measurement, like ad placements for an ad server, should also be clicked systematically – count the impressions and count the clicks.
Step 4: Click through every funnel you have set up, several times, all the way to the goal. If you have goals like time on site or number of pages viewed, make sure you stay long enough and look at enough pages to meet these goals.
If there are required pages in your funnels, make sure you pass through them. Again keep updating the tallies of page views and actions as you do all this.
Step 5: Wait until the data is likely to be available for reporting. Latency varies by platform. Pull reports, filtering for your hostname. You should see that the numbers of page views, actions, ad impressions, ad clicks, goals/conversions, and funnel stages matches what you did in steps 1-3. If they do not, you probably either have:
a. a tagging problem (wrong tag, misimplemented tag, redundant tags, etc.)
b. a setup problem (e.g. definitions for goals/conversions, funnels)
c. other users muddying up your data by hitting the site in staging while you are testing.
d. a more exotic and difficult problem
If this all sounds like a pain in the hindquarters, compare it to the pain of realizing that you have been reporting erroneous numbers and making business decisions based on them for months or years. Believe me, they could be so far off that you’d have been better off guessing or making numbers up. Do not trust what you cannot verify with test results, or you will have much pain and sadness in your future.
Tags: Data Validation, Debugging, Debugging Web Analytics Data, Google Analytics, Omniture, Web Analytics, Webtrends