The Pitfalls of A/B Testing and Benchmarking

Improvement begins with measurement, but the ruler can also limit your audacity to try wildly new approaches (photo by Flicker user Thomas Favre-Bulle).
Google is famous for, among other things, crafting a deep, rich culture of A/B testing, the process of comparing the performance of two versions of a web site (or some other output) that differ in a single respect.

The benefit: changes to a web site or some other user interface are governed by real-world user behavior. If you can determine that your email newsletter signup button performs better with the label “Don’t Miss Out” instead of “Subscribe,” well, that’s an easy design change to make.

The practice of benchmarking – using industry standards or averages as a point of comparison for your own performance – has some strong similarities to A/B testing. It’s an analytic tool that helps frame and drive performance-based testing and iteration. The comparison of your organization’s performance to industry benchmarks (e.g., email open rates, average donation value on a fundraising drive) provides the basis for a feedback loop.

The two practices – A/B testing and benchmarking – share a hazard, however. Because a culture of A/B testing is driven by real-time empirical results, and because it generally depends on comparisons between two options that are identical in every respect but one (the discrete element that you are testing), it privileges modest, incremental changes at the expense of audacious leaps.

To use a now-classic business comparison: while Google lives and breathes A/B testing, and constantly refines its way to small performance improvements, the Steve Jobs-era Apple eschewed consumer testing, assuming (with considerable success) that the consumer doesn’t know what it wants and actually requires an audacious company like Apple to redefine product categories altogether.

Similarly, if your point of reference is a collection of industry standards, you are more likely to aim for and be satisfied with performance that meets those standards. The industry benchmarks, like the incremental change model that undergirds A/B testing, may actually constrain your creativity and ambitiousness, impeding your ability to think audaciously about accomplishing something fundamentally different than the other players in your ecosystem, or accomplishing your goals in a profoundly different way.

The implication isn’t that you should steer clear of A/B testing or benchmarking. Both are powerful tools that can help nonprofits focus, refine, and learn more quickly. But you should be aware of the hazards, and make sure even as you improve your iterative cycles you are also protecting your ability to think big and think different about the work your organization does.

And if you want to dive in, there are a ton of great resources on the web, including a series of posts on A/B testing by the 37Signals guys (Part 1, Part 2, and Part 3), the “Ultimate Guide to A/B Testing” on SmashingMagazine, an A/B testing primer on A List Apart, Beth Kanter’s explanation of benchmarking, and the 2012 Nonprofit Social Network Report.

Leave a Reply