Optimization vs. Improvement

The word “optimization” can be used to refer to a mathematical technique for selecting the best combination of controllable factors to maximize the value of an outcome or dependent variable. It can also be a fancy substitute for the word “improvement”, used to project the impression you are applying some kind of mathematical rigor, when in fact you are not.

lipstickpluspig

This entire post may sound like nitpicking, but there is a real issue at stake.

The main difference is this:
– If you are collecting data to map out the entire feasible space of options, then picking the best solution, you are doing “optimization”.
– If you are trying two or three of something, and then picking the winner, you are doing “improvement by trial and error”.
“Optimization” leaves you with some assurance that you have left no improvement on the table. “Improvement by trial and error” may deliver value, but without a sense of whether there are further gains available.

In the realm of marketing analysis, this distinction is further tangled by the fact that there is no way to actually map out all possible creative executions, all the possible ways to deliver them to an audience (unless you believe all the good ideas have been done already, in which case you need historians and not creatives.)

So, when someone sells you creative optimization using a multivariate testing engine that automatically spits out thousands of combinations of a background image, a headline, a call to action and a button to click, you may think you are doing real optimization, but that is true only if your collection of copy bits and graphic elements represents the whole feasible range of creative executions and not just what someone could crank out in a week or so. That, IMHO, is not likely. It is really precision masquerading as accuracy or – put differently – marketers opting for the easily quantifiable to the exclusion of the extremely effective.

Not all marketing problems are best approached via formal optimization – particularly those where creative execution is a critical variable. But it is possible to apply good test design and measurement to understand when something works better than what we are currently doing and when it does not. Just don’t call that “optimization” – that is almost like lying.

Based on Google Search Volume, the 2012 Election Looks Close

I decided to take a look at the search volumes over the last 90 days for the terms “Democratic Party” and “Republican Party”.

Google Trends Chart: Republican Party vs. Democratic Party

You can clearly see the convention bounces reflected in the search volumes. For some reason, the debates don’t seem to drive that much difference in search activity.
You can see the spikes on the days of the debates (October 3, October 11, and October 16), but both parties go up about the same amount.
In recent weeks, this election looks close.

What do you think? Are you willing to leave it up to everyone else, or are you going to get out and make your choice count?

How Not To Do An A/B Test

A Homemade Mess

There are a large number of ways to make a hot mess out of an A/B test. Here are five:

1. Don’t Measure Conversions
“We don’t have time to set up conversion tracking. Let’s just decide based on click rate.”
This is a terrible idea. Clicking and converting are two very different things, and click rates are often not correlated with conversion rates. For example, I click on pictures of Ferraris, because I like to look at Ferraris, but you can ask all my friends – I have never bought a Ferrari. I have bought a Mazda, a Toyota, a Datsun and a Ford Maverick. You can show me a Ferrari if all you want is clicks, but show me something I might actually buy if you want conversions.

2. Don’t Do Any Test Size Calculations
Ten minutes of work could tell you that you won’t have enough data to read your test even if you ran it for two years. Are you sure you can’t afford some time to do a Google search for “A/B Test Calculator” and plug some numbers into a form?

I’ll save you even more time, use this one: ABBA

3. Stick With Your Test Size Calculations No Matter What Happens
The test size calculations you did were based on some assumptions: confidence level, the magnitude of the difference you wanted to be able to detect, and the expected performance of the baseline or control. After you’ve run the test for a while you can begin to see where reality and your assumptions have parted ways. What should you do? Most people do repeated significance calculations and quit when they are satisfied with the significance. If you do this, you’ve spent too much time and opportunity cost on your test. You could have quit sooner, had you known about Anscombe’s Stopping Rule, which uses an approach called regret minimization, and you would actually end up with more conversions.

Check it out: A Bayesian Approach to A/B Testing

4. Don’t Think About Gating
What is gating?
Let’s say you have two different versions of a page: Version A and Version B. Let’s say your plan is to rotate them randomly. Let’s say your site and your content are such that most people come to the site repeatedly, say two to six times per week. If you are rotating Version A and Version B completely at random, then most of your users are going to see a blended treatment. This will reduce the effects of your test. To fix this, you want to make the version a person sees “sticky’ so one group sees only Version A during the test and the other group sees only Version B. That way each group sees a consistent treatment and you will see more of an effect (assuming the differences between A and B are substantial enough).
This is called “gating” and is done by randomly assigning new visitors (people with no gating in their cookie) to Version A or B, and then storing that in their cookie so that the next time they will see the same version.

5. Conclude That Your A/B Testing Result is Actually Optimal
An A/B test picks one “best” version for everyone. But isn’t it possible that there are some people in the audience who’d respond best to Version A and others who’d respond best to Version B? For that, you’d need to be able to collect lots of data about what kinds of users respond to the different options, and then you’d need a way to target the two versions at the audiences they work best with. Fortunately such tools exist.

Check out the toolset I work with every day at [X+1]: [X+1] Home Page

When Thomas Friedman Notices, Something Must Really Be Going On


A couple of months ago, a bunch of us at [x+1] started taking a free online course in machine learning taught by Stanford University’s Andrew Ng. It was an offer difficult to refuse: a company called Coursera is offering top-flight university courses for free, and Andrew Ng’s class is among them. Thomas Friedman’s recent New York Times column offered up Dr. Ng’s experience as emblematic of a larger trend in higher education (i.e., online alternatives to an increasingly costly option for young students), but what caught my eye were the numbers.

Ng was quoted in the article:

“I normally teach 400 students,” Ng explained, but last semester he taught 100,000 in an online course on machine learning. “To reach that many students before,” he said, “I would have had to teach my normal Stanford class for 250 years.”

This blew my mind. I have been doing this kind of work for half my life, and I have never run into a group of more than 200-300 of us at one time – and that was at a SAS convention in Vegas! Now 100,000 people are taking this one course per semester!!!! I was immediately imagining a future when:

1. People at parties would understand my answer to “What do you do?” without my having to offer a hyper-simplified reduction (“statistical marketing” is the best I have come up with).
2. It will be easier to find talent in this business (good for building teams, maybe not so good for salaries).
3. A world where the business people I work with have some kind of reasonable sense of what you can and can’t do with statistics.

Realistically, these numbers need to be discounted for people dropping off because of lack of time, interest, and perhaps aptitude. However, this does point to a major trend in analytics: business people are increasingly aware that they need modern data analytics to function in the new, data-rich digital world. Given the tendency of digital entrepreneurs to flog us with buzzword-laden feather merchantry, we can also be assured of 5-10 years of “big data” blather, but I guess mimicry is the cost of increasing demand for our field.

Statistics’ Greatest Hits, Part I

How many times have you given a data-based presentation that made people burst into applause or made people weep? Have you ever given a talk that changed the world?

By my reckoning this has happened more than once to Hans Rosling, a professor of global health at Sweden’s Karolinska Institute. There are two TED talks by Swedish Professor Hans Rosling that stand out in my mind as outstanding examples of how statistical analysis can be:
a. Presented in a spectacularly effective way
b. Used for good in the world

In the first, referred to by many of my friends as the “Gapminder” TED talk, Dr. Rosling uses animated bubble charts to make (stunningly well) some points about worldwide trends in infant mortality, life expectancy, and distribution of wealth and productivity. The talk shows with shocking clarity how the world has been changing over time, in a form which can explode through ignorance and prejudice and create understanding. If you haven’t seen it, you should (see below).

You can use the Gapminder software to create your own animated Bubble Charts:
Gapminder Desktop download

You can also keep up with the organization that has grown up out of this effort on the Gapminder site: Gapminder.org

The second TED talk I refer to is the “Washing Machine” TED talk. This talk really showcases Rosling’s storytelling and showmanship, but is still at its heart a presentation based on data. I believe, if I could learn to present like this, there is no limit to what I could accomplish.

If you have some other nominees for Statistics Greatest Hits, please leave them in a comment.

Measurement is No Substitute for Thinking BIG

Whether you are doing SEO for a site or running paid search for one, running display ad campaigns or social media, everyone is trying to measure the same thing – they are trying to find evidence that what they are doing is worth the money and time it costs.

There is a point of view (not necessarily mine) that says: If you have to do a complicated analysis to see the effect of a marketing initiative, then it wasn’t very effective. There is some truth there. It is easier, statistically speaking, to measure a BIG marketing impact than it is to measure a small one.

Reality is complicated, though. If you are engaged in “filling the funnel”, you want to know how how that translates – in the long run – into actual sales. However, unless you can wait until there is data for the whole decision and purchase process to make any decisions about how to manage the campaign and the channel, then you will have to go with measuring some intermediate impact.

It is almost a certainty that enough other things will happen to impact sales in the time between your funnel-filling campaign and the sales it ultimately leads to. Enough things to muddy the waters about how much of of your success (or lack of it) came from the lagged effect of your funnel-filling efforts. Unless the effect is big.

This is not to say that measurement is not necessary for early-stage marketing activities, but to say that you have to apply some common sense to your measurement problems, and one bit of marketing common sense is this: think BIG. Now think BIGGER. You should always be aiming to have a big effect – you won’t always succeed in a huge way, but it should not be for lack of trying.

Here’s an idea: Every time you create a campaign, a marketing tactic, an ad, you should at least TRY to do some creative thinking that taps into one or more sources of disproportionate (on the BIG side) response. What well-defined and targetable group would have a peculiar affiinity with your message and your product? What would make them want to know more NOW, click NOW, buy NOW?

What does that have to do with measurement? Two things:
1. Even if your BIG idea doesn’t work, you are actually testing a hypoethesis and so you have gotten just a little smarter.
2. Your goal is to produce impact so BIG that you don’t even really need to measure it to know that the effort was ROI-positive. (But you are measuring it anyway, so you can explain why the next one needs to have a bigger budget!) And – falling short of a really BIG goal will get you to positive business results more often than falling short of modest goals.

The Digital Nervous System

The web can function like a giant extension of the human nervous system. Like a spider at the center of a giant global web, you can collect and observe streams of data coming from all over the digital expanse: searches, tweets, forums, blogs, newspaper and magazine sites, press releases, Facebook and LinkedIn. Each time someone looks for or mentions your company or your product you are alerted, and you can choose in that moment to respond to it, ignore it or wait until you have more information.

Does this sound like anything you are doing now? Someone should be doing this for your company, because marketing has increasingly become an ongoing series of conversations (whether you participate in the conversation or not).

EXPERIMENT: DETECTING INSTANT RESPONSE TO MEDIA WITH THE INTERNET

There are several national TV shows that frequently have book authors as guests (the Daily Show, The Colbert Report, The Today Show, Good Morning America). The next time you find yourself in front of one of these shows when an author is on plugging their book, try the following experiment (this will work best with a show with a national audience):

1. Fire up your laptop and go to amazon.com
2. Search in the Books category for the title of the book the author is plugging on the show you are watching
3. Click to the Amazon page for that book.
4. Scroll down past the synopsis and the reviews to the section labelled Product Details. It should look something like this:

The number I have circled is the book’s current sales rank on Amazon.

5. Every few minutes while the author is on the show and for a while after that (until you bore of this experiment), hit function key f5 to refresh the page and watch what happens to the book’s sales ranking.

The rank should get better – in real time – as you are sitting there. I have done this several times when my brother-in-law has done TV appearances to promote his books, and it is amazing. Once he was on Oprah Winfrey and we saw the sales rank improve precipitously from 20-something into the top 10 while he was being interviewed.

Now imagine all the other analogous information streams there are available on the internet. If you could get the monitoring automated, just think of how quickly you will know exactly what the world thinks of your new site, your new ad campaign, your new product. Just think of what you’d be missing by NOT knowing.

EXTRA CREDIT EXPERIMENT #1 – THE TWITTER BUMP

In between rank checks you should do check in on Twitter searches for the author’s name and the book’s name. These should also pop during the author’s TV appearance.

EXTRA CREDIT EXPERIMENT #2 – THE GOOGLE BUMP

After a day or so you should go to Google Trends and see what happened to searches for the author’s name and the book’s name. These should’ve spiked on the day the author did the TV appearance. Google Trends doesn’t provide much flexibility about getting more granular (in time) data in a more real-time way, and it looks like the beta for Google Insights for Search has a latency of a couple of days.

GOOGLE EPIDEMIOLOGY – WHO KNEW THEY COULD DO THAT?

Take a look at the Google Flutrends project (http://www.google.org/flutrends) and you can see what an amazingly useful datasource this would be with access to the full detail in realtime. It turns out that counting Google searches for flu information is a quicker detector of flu epidemics than CDC reports are.

I believe it would be just as accurate in detecting other kinds of contagion sweeping through the world: fads, emerging trends, scares, rumors, accidents, disasters – this is the kind of information that businesses need to know when it involves their products, their brands, or their markets.

Classic GI=GO Equation Holds True for Web Analytics

Garbage In = Garbage Out. People who spend their working hours analyzing numbers generally come to this realization. It is true for modeling, it is true for forecasting, and it is completely true when it comes to website analytics.

The chain of events looks something like this:

1. Someone visits a website integrated with a web analytics platform like Google Analytics, Webtrends or Omniture.
2. A web page visitor either navigates to a tracked page or performs a tracked action.
3. A script is executed in the browser, sending data to the analytics platform.
4. The data is added to the datastore.
5. The data is summarized and analyzed.

Problems arise when you assume that steps 1 through 4 are happening correctly, and you move right on to looking at reports and data that come out of the process. Oddly, most site developers I have met who are instrumenting a site for web analytics consider their job done and successful if tags fire when they are expected to. They don’t look at the data as it is passed with the tags and they don’t look and see what made it into the web analytics platform’s datastore. Anything you don’t check in software development is frequently going to be wrong. If your data is wrong, then all your analysis of it will be just as wrong as the data. Again, there’s the classic equation describing this relationship:

Garbage In = Garbage Out

How do you prevent your data from being garbage? QA and debug the data, that’s how.

Before you use information coming from a web analytics solution, you should (or someone should) do these two tests:

1. Web Analytics Data Test Number One: Is the data being passed correctly?

Use a header tool of some kind to see what tags are being invoked and what kind of data they are passing to the web analytics platform. I use WASP. It shows you what kind of tag is fired when you click on navigation and site functions, and then it lists the data values the tag passes. The test is this:

Step 1: Navigate to every page in the site. A pageview should be generated for every pageview you generate and it should have the correct page name passed with it. Implementation of this is usually OK for standard HTML sites, but is error-prone for Flash sites.

Step 2: Click every function you are tracking as an action or event. See that an action or event is generated for each one you click, and that it is firing a tag that classifies it correctly – as an event, not a page view, and that the name and category that are assigned to the action are what they should be.

(Steps 3-n): Anything else you have tagged for measurement, like ad placements for an ad server, should also be clicked systematically to see that everything that is supposed to be captured about ad impressions is actually captured and passed when the tag is fired.

2. Web Analytics Data Test Number Two: Is the data making it into the database(s) correctly?

Set up your full site in staging so it will have a recognizable hostname that you can filter by in your reporting tool. Tell everyone else not to play with the version in staging for a while.

Step 1: Navigate to every page in the site in a systematic order. Do this several times. Make sure you keep track of how many times each page is viewed.

Step 2: Click every function you are tracking as an action or event. Do this several times. Make sure you keep track of how many times each action is done.

(Steps 3a-3n): Anything else you have tagged for measurement, like ad placements for an ad server, should also be clicked systematically – count the impressions and count the clicks.

Step 4: Click through every funnel you have set up, several times, all the way to the goal. If you have goals like time on site or number of pages viewed, make sure you stay long enough and look at enough pages to meet these goals.
If there are required pages in your funnels, make sure you pass through them. Again keep updating the tallies of page views and actions as you do all this.

Step 5: Wait until the data is likely to be available for reporting. Latency varies by platform. Pull reports, filtering for your hostname. You should see that the numbers of page views, actions, ad impressions, ad clicks, goals/conversions, and funnel stages matches what you did in steps 1-3. If they do not, you probably either have:
a. a tagging problem (wrong tag, misimplemented tag, redundant tags, etc.)
b. a setup problem (e.g. definitions for goals/conversions, funnels)
c. other users muddying up your data by hitting the site in staging while you are testing.
d. a more exotic and difficult problem

If this all sounds like a pain in the hindquarters, compare it to the pain of realizing that you have been reporting erroneous numbers and making business decisions based on them for months or years. Believe me, they could be so far off that you’d have been better off guessing or making numbers up. Do not trust what you cannot verify with test results, or you will have much pain and sadness in your future.

Listing Your Way to the Finish Line

Draft Marketing Analysis Checklist

After reading Atul Gawande’s recent book “Checklist Manifesto”, I was thinking there should be a checklist for marketing analysis. One point that Mr. Gawande makes in his book is that highly-trained specialists shun checklists because in their minds only dummies need lists. However, a majority of surgeons, while rejecting lists for their own use, would want another surgeon to use one if operating on them. This is because they know how easy it is to forget one detail in hundreds.

In marketing analysis, there are a lot of steps and a lot of things to think about, and even a smart person might drop a stitch here or there if they are not following some kind of list. I have included a rough one I dashed off quickly, in hopes that others might offer refinements, altogether better lists, or more specific versions for types of marketing programs. Here it is, have at it!

DEFINE
• SET goals/ hypotheses for program
• SELECT metrics
• CREATE a measurement plan
EXECUTE
• EXECUTE program and measurement plan
• VALIDATE raw data
• PREPARE dataset for analysis
ANALYZE
• VISUALLY EXPLORE dataset for patterns and problems
• SUMMARIZE dataset statistics
• SCORE performance vs. goals/ support for hypotheses
• LIST likely conclusions
• IDENTIFY unexpected or surprising findings
• VALIDATE likely conclusions with numerical/statistical support
• SELECT final findings
COMMUNICATE
• REPORT findings for future activity
• REVIEW findings with user community
• CAPTURE questions &issues from user community
FOLLOW-UP
• INVESTIGATE user-identified questions & issues
• IDENTIFY impact on original findings
• REPORT findings of follow-up analysis

What do you think?

Conversion Optimization: It’s All About Action

Optimizing your website to maximize the number of page views or visitors, while sounding reasonable, may unwittingly have you wasting marketing dollars and effort on people who won’t buy anything or participate on your website (or your advertisers’ websites) in the foreseeable future.

When you spend time and money on your site content or on audience development for your site, you want to make sure you are measuring the impact of those changes in terms of number of desired actions taken by visitors to your site, in terms of the efficiency with which you are spending resource. The key measure you are tracking on the cost side is the ECPA, or effective cost per action. If you have a small site and are passive about audience development, perhaps it makes sense to optimize to Actions Per Visit (APV), or Actions Per Daily Unique visitor (APDU). But if you are spending serious time and money then you need to track these costs and what they generate.

Lights! Camera! Actions!

Before this kind of thing makes any sense at all you have to define and start measuring the on the kinds of action you are trying to get visitors to take. Are you selling things? Are you getting paid for advertising shown on your site? Are you trying to develop leads for your business? Are you trying to get people to download something? Are you trying to get people to register or sign up? Whatever actions you want people to take on your site, they need to be measured if they are going to be the basis for your ECPA (or APV, APDU). Most of these things can be measured using Google Analytics.

In any case, once you have tagged or otherwise instrumented your site to capture your desired actions, then you can track ECPA (or APV, APDU) associated with your site.

Then when you make big changes, you can see whether they improved your site’s performance. You can measure the effectiveness of your SEM, your CPC campaigns on search engines, your affiliate programs, and your efforts to publicize your site.

Measuring Dollars Out per Dollar In

ECPA is a pretty good measure, but, it only measures efficiency on the cost side. You also want to measure the return you get in dollars and cents. You can do this (or approximate this) if you can come up with a dollar value for each of your site’s target actions, either using an average value per action type or actual value per action, then you don’t need the oversimplification that focusing only on ECPA imposes. Simply put, all actions on your site are not worth the same amount and it actually makes sense to spend more on actions that are worth more. What you really ultimately want is an ROI. I’ll talk about that in a later post.