The Pitfalls of A/B Testing and Benchmarking

Improvement begins with measurement, but the ruler can also limit your audacity to try wildly new approaches (photo by Flicker user Thomas Favre-Bulle).

Google is famous for, among other things, crafting a deep, rich culture of A/B testing, the process of comparing the performance of two versions of a web site (or some other output) that differ in a single respect.

The benefit: changes to a web site or some other user interface are governed by real-world user behavior. If you can determine that your email newsletter signup button performs better with the label “Don’t Miss Out” instead of “Subscribe,” well, that’s an easy design change to make.

The practice of benchmarking – using industry standards or averages as a point of comparison for your own performance – has some strong similarities to A/B testing. It’s an analytic tool that helps frame and drive performance-based testing and iteration. The comparison of your organization’s performance to industry benchmarks (e.g., email open rates, average donation value on a fundraising drive) provides the basis for a feedback loop.

The two practices – A/B testing and benchmarking – share a hazard, however. Because a culture of A/B testing is driven by real-time empirical results, and because it generally depends on comparisons between two options that are identical in every respect but one (the discrete element that you are testing), it privileges modest, incremental changes at the expense of audacious leaps.

To use a now-classic business comparison: while Google lives and breathes A/B testing, and constantly refines its way to small performance improvements, the Steve Jobs-era Apple eschewed consumer testing, assuming (with considerable success) that the consumer doesn’t know what it wants and actually requires an audacious company like Apple to redefine product categories altogether.

Similarly, if your point of reference is a collection of industry standards, you are more likely to aim for and be satisfied with performance that meets those standards. The industry benchmarks, like the incremental change model that undergirds A/B testing, may actually constrain your creativity and ambitiousness, impeding your ability to think audaciously about accomplishing something fundamentally different than the other players in your ecosystem, or accomplishing your goals in a profoundly different way.

The implication isn’t that you should steer clear of A/B testing or benchmarking. Both are powerful tools that can help nonprofits focus, refine, and learn more quickly. But you should be aware of the hazards, and make sure even as you improve your iterative cycles you are also protecting your ability to think big and think different about the work your organization does.

And if you want to dive in, there are a ton of great resources on the web, including a series of posts on A/B testing by the 37Signals guys (Part 1, Part 2, and Part 3), the “Ultimate Guide to A/B Testing” on SmashingMagazine, an A/B testing primer on A List Apart, Beth Kanter’s explanation of benchmarking, and the 2012 Nonprofit Social Network Report.

Testing, testing, testing…is this message on?

So you’re sitting there lamenting the somewhat lethargic results of recent email campaigns and wondering if a little tweak to your email or landing page would improve results. Maybe change up the subject line – add or remove the organization. Maybe add a photo or two to the message. Maybe change the placement of a link or form or call to action on a landing page. Would that get more conversions you wonder? There has to be an easy way to bump this up, you think to yourself.

Photo by Sebastian Bergmann, flickr.So you post an email to a handy helpful email list largely made up of folks doing similar work asking if a subject line change would help. The feedback is extensive but largely anecdotal. Hardly anyone offers up actual data and most of the stories are second or third hand… “well, a group I used to work with put the name of the organization in the subject line and it helped a little, I think.”

And you think, “well, that good but it’s not exactly the same situation I’m dealing with here. It’s a good story but doesn’t exactly apply to my list.”

My god, man… then why not test it on your list!

The thing is, testing on one’s own list and pages is pretty darn easy (though we can make it quite complicated and involved, sometimes for good reason and other times not) but rarely done.

Okay, so the ease of testing depends on the tools at hand. If your email system/online CRM is pretty unwieldy or you just don’t know how to use it then little tweaks here and there can be massive potholes in the road, not small bumps. If you don’t know how to move things around on your site – or have the staff to do so – then little changes can be tough.

Yet what I’ve found is more often a lack of interest or curiosity about testing. More often, folks are resigned to the results they get or, if not sure, just don’t know how to proceed. What to test? How to set it up? Is it worth the time?

Valid questions all. But I think the lack of a learning culture is more the culprit. More on that in a future post but first… what to test. [Read more...]