What Is A/B Testing: How To Do It and Practical Examples (2024)

Our view at Stack - Shopify has just about everything you need if you're looking to sell online. It excels with unlimited products, user-friendly setup, and 24/7 support. It offers 6,000+ app integrations, abandoned cart recovery, and shipping discounts up to 88%. Plus, it allows selling both online and in-person, scaling as your business grows.

Learn about A/B tests, conversion research and idea prioritization to test analysis and archive management from experts at Google, HubSpot and Shopify.

Whether you’re an experienced entrepreneur or just getting started, there’s a good chance you’ve seen countless articles and resources about A/B testing. You might already A/B test your email subject lines or your social media posts.

Despite the fact there’s been plenty said about A/B testing in the field of marketing, many entrepreneurs stumble in practice. The result? Major business decisions based on inaccurate results from an improper test.

A/B testing often is over simplified, especially in content written for store owners. Ahead, you’ll find everything you need to get started with different types of A/B testing for ecommerce, explained as plainly—but usefully—as possible. A/B testing can be a game changer for choosing the right product positioning, increasing conversions on a landing page, and so much more.

What is A/B testing?

A/B testing, also referred to as split testing or bucket testing, is the process of comparing two versions of the same web page, email, or other digital asset to determine which one performs better based on user behavior.

It’s a useful tool for improving the performance of a marketing campaign and better understanding what converts your target audience. A/B testing allows you to answer important business questions, helps you generate more revenue from the traffic you already have, and sets the foundation for a data-informed marketing strategy.

How A/B testing works

Define your goal. Establish your goals for the A/B test, such as increasing conversions, click-through rates, or overall sales.
Choose the element to test. You can test headlines, images, email subject lines, call to actions (CTAs), pricing, layouts, etc.
Create variations. Develop two versions of the element: Version A, which is the original version of your asset, or the “control.” Version B, the new version with the changes you want to test, is known as the “variant.” In the context of marketing, you show 50% of visitors Version A and 50% of visitors Version B.
Run the test. Expose both groups to the same version over a predetermined period. For example, if you’re testing an ecommerce site’s homepage CTA button, you might run tests for two weeks to achieve statistically significant results.
Collect data. Monitor and measure conversions, click-throughs, engagement levels, and sales across both versions.
Analyze the results. Compare the performance of Version A versus Version B to determine which more effectively meets your goal. The version with the highest conversion rate wins.
Declare the winner. If Version B has the highest conversion rate, declare it the winner and send 100% of visitors there. This becomes the new control, and you must design another variant for future tests.

💡Consideration: An A/B test conversion rate can often be an imperfect measure of success.

For example, if you price an item for $50 on one page and it’s completely free on the other, that won’t provide any truly valuable insight. As with any tool or strategy you use for your business, it has to be strategic.

That’s why you should track the value of a conversion all the way through to the final sale.

When you should A/B test

If you’re running a low-traffic site or a web or mobile app, A/B testing is probably not the best optimization effort for you. You will likely see a higher return on investment (ROI) from conducting user testing or talking to your customers, for example. Despite popular belief, conversion rate optimization does not begin and end with testing.

Why two to four weeks? Remember, you want to run tests for at least two full business cycles. Usually, that works out to two to four weeks. Now maybe you’re thinking, “No problem, I’ll run the test for longer than two to four weeks to reach the required sample size.” That won’t work either.

The longer a test is running, the more susceptible it is to external validity threats and sample pollution. For example, visitors might delete their cookies and end up re-entering the A/B test as new visitors. Or someone could switch from their mobile phone to a desktop and see an alternate variation.

Essentially, letting your test run for too long can skew results just as much as not letting it run long enough.

Testing is worth the investment for stores that can meet the required sample size in two to four weeks. Stores that can’t should consider other forms of optimization until their traffic increases.

Set up your A/B testing process

Prioritize A/B test ideas

A huge list of A/B testing examples is exciting, though unhelpful for deciding what to test. Where do you start? That’s where prioritization comes in.

There are a few common A/B testing prioritization frameworks you can use:

ICE. ICE stands for impact, confidence, and ease. Each of those factors receives a ranking of 1 to 10. For example, if you could easily run the test by yourself without help from a developer or designer, you might give ease an 8. You’re using your judgment here, and if you have more than one person running tests, rankings may become too subjective. It helps to have a set of guidelines to keep everyone objective.
PIE. PIE stands for potential, importance, and ease. Again, each factor receives a 1 to 10 ranking. For example, if the test will reach 90% of your traffic, you might give importance an 8. PIE is as subjective as ICE, so guidelines can be helpful for this framework as well.
PXL. PXL is the prioritization framework from educational platform CXL. It’s a little bit different and more customizable, forcing more objective decisions. Instead of three factors, you’ll find Yes/No questions and an ease-of-implementation question. For example, the framework might ask: “Is the test designed to increase motivation?” If yes, it gets a 1. If no, it gets a 0. You can learn more about this framework and download a spreadsheet.

Once you have an idea of where to start, it can also help to categorize your ideas. For example, during some conversion research you could use three categories: implement, investigate, and test.

Implement. Just do it. It’s broken or obvious.
Investigate. Requires extra thought to define the problem or narrow in on a solution.
Test. The idea is sound and data informed. Test it!

Between this categorization and prioritization, you’ll be set to start A/B testing.

Develop a hypothesis

Before you test anything, you need to have a hypothesis. For example, “If I lower what I charge for shipping, conversion rates will increase.”

Don’t worry—forming a hypothesis in this situation isn’t as complicated as it may sound. Basically, you need to test a hypothesis, not an idea. A hypothesis is measurable, aspires to solve a specific conversion problem, and focuses on insights instead of wins.

Whenever writing an hypothesis, it helps to use a formula borrowed from Craig Sullivan’s Hypothesis Kit:

Because you see [insert data/feedback from research]
You expect that [change you’re testing] will cause [impact you anticipate], and
You’ll measure this using [data metric]

Easy, right? All you have to do is fill in the blanks and your A/B test idea has transformed into a hypothesis.

Choose an A/B testing tool

Now you can start choosing an A/B testing tool or split testing service. More often than not, you’ll think of Google Optimize, Optimizely, and VWO first. All are good, safe options.

Here’s more information about those popular A/B testing tools:

Google Optimize. Free, save for some multivariate limitations, which shouldn’t really impact you if you’re just getting started. It works well when performing Google Analytics A/B testing, which is a plus.
Optimizely. Easy to get minor tests up and running, even without technical skills. Stats Engine makes it easier to analyze test results. Typically, Optimizely is the most expensive option of the three.
VWO. VWO has SmartStats to make analysis easier. Plus, it has a great WYSIWYG editor for beginners. Every VWO plan comes with heat maps, on-site surveys, form analytics, etc.

There are also A/B testing tools in the Shopify App Store you might find helpful.

Once you’ve selected an A/B testing tool or split-testing software, fill out the sign-up form and follow the instructions provided. The process varies from tool to tool. Typically, though, you’ll be asked to install a snippet on your site and set goals.

Decide how to analyze results

If you craft your hypothesis correctly, even a loser is a winner, because you’ll gain insights you can use for future tests and in other areas of your business. So, when you’re analyzing your test results, you need to focus on the insights, not whether the test won or lost. There’s always something to learn, always something to analyze. Don’t dismiss the losers!

The most important thing to note here is the need for segmentation. A test might be a loser overall, but chances are it performed well with at least one audience segment.

Here are some examples of audience segments:

New visitors
Returning visitors
iOS visitors
Android visitors
Chrome visitors
Safari visitors
Desktop visitors
Tablet visitors
Organic search visitors
Paid visitors
Social media visitors
Logged-in buyers

You get the idea, right?

Odds are that the hypothesis was proven right among certain segments. That tells you something as well.

Analysis is about so much more than whether the test was a winner or a loser. Segment your data to find hidden insights below the surface.

A/B testing software won’t do this analysis for you, so this is an important skill to develop over time.

Archive your test results

Say you run your first test tomorrow. Two years from tomorrow, will you remember the details of that test? Not likely.

That’s why archiving your A/B testing results is important. Without a well-maintained archive, all those insights you’re gaining will be lost. Plus, it’s very easy to test the same thing twice if you’re not archiving.

There’s no “right” way to do this, though. You could use a tool like Effective Experiments, or you could use a simple spreadsheet. It’s really up to you, especially when you’re just getting started.

Whatever tool you use, make sure you’re keeping track of:

The tested hypothesis
Screenshots of the control and variation
Whether it won or lost
Insights gained through analysis

As you grow, you’ll thank yourself for keeping this archive. Not only will it help you, it will help new hires and advisers/stakeholders as well.

A/B testing examples

Technical analysis

Does your store load properly and quickly on every browser? On every device? You might have a shiny new smartphone, but someone somewhere is still rocking a flip phone from 2005. If your site doesn’t work properly and quickly, it definitely doesn’t convert as well as it could.

On-site surveys

These pop up as your store’s visitors browse around. For example, an on-site survey might ask visitors who have been on the same page for a while if there’s anything holding them back from making a purchase today. If so, what is it? You can use this qualitative data to improve your copy and conversion rate.

Customer interviews

Nothing can replace getting on the phone and talking to your customers. Why did they choose your store over competing stores? What problem were they trying to solve when they arrived on your site? There are a million questions you could ask to get to the heart of who your customers are and why they really buy from you.

Customer surveys

Customer surveys are full-length surveys that go out to people who have already made a purchase (as opposed to visitors). When designing a customer survey, you want to focus on: defining your customers, defining their problems, defining hesitations they had prior to purchasing, and identifying words and phrases they use to describe your store.

Analytics analysis

Are your analytics tools tracking and reporting your data properly? That might sound silly, but you’d be surprised by how many analytics tools are configured incorrectly. Analytics analysis is all about figuring out how your visitors behave. For example, you might focus on the funnel: Where are your biggest conversion funnel leaks? In other words, where are most people dropping out of your funnel? That’s a good place to start testing.

User testing

This is where you watch real people in a paid, controlled experiment try to perform tasks on your site. For example, you might ask them to find a video game in the $40 to $60 range and add it to their cart. While they’re performing these tasks, they narrate their thoughts and actions out loud.

Session replays

Session replays are similar to user testing, but now you’re dealing with real people with real money and real intent to buy. You’ll watch as your actual visitors navigate your site. What do they have trouble finding? Where do they get frustrated? Where do they seem confused?

There are additional types of research as well, but start by choosing the best A/B testing method for you. If you run through some of them, you will have a huge laundry list of data-informed ideas worth testing.

📚Learn more: 7 Actionable A/B Testing Examples for Your Ecommerce Store

A/B testing processes of the pros

Now that you’ve been through a standard A/B testing tutorial, let’s take a look at the exact processes of pros.

Krista Seiden, KS Digital

My step-by-step process for web and app A/B testing starts with analysis—in my opinion, this is the core of any good testing program. In the analysis stage, the goal is to examine your analytics data, survey or UX data, or any other sources of customer insight you might have in order to understand where your opportunities for optimization are.

Once you have a good pipeline of ideas from the analysis stage, you can move on to hypothesize what might be going wrong and how you could potentially fix or improve these areas of optimization.

Next, it’s time to build and run your tests. Be sure to run them for a reasonable amount of time (I default to two weeks to ensure I’m accounting for week-over-week changes or anomalies), and when you have enough data, analyze your results to determine your winner.

It’s also important to take some time in this stage to analyze the losers as well—what can you learn from these variations?

Finally, and you may only reach this stage once you’ve spent time laying the groundwork for a solid optimization program, it’s time to look into personalization. This doesn’t necessarily require a fancy tool set, but rather can come out of the data you have about your users.

Marketing personalization can be as easy as targeting the right content to the right locations or as complex as targeting based on individual user actions. Don’t jump in all at once on the personalization bit though. Be sure you spend enough time to get the basics right first.

Alex Birkett, Omniscient Digital

At a high level, I try to follow this process:

Collect data and make sure analytics implementations are accurate.
Analyze data and find insights.
Turn insights into hypotheses.
Prioritize based on impact and ease, and maximize allocation of resources (especially technical resources).
Run a test (following statistics best practices to the best of my knowledge and ability).
Analyze results and implement or not according to the results.
Iterate based on findings, and repeat.

Put more simply: research, test, analyze, repeat.

While this process can deviate or change based on what the context is (Am I testing a business-critical product feature? A blog post CTA? What’s the risk profile and balance of innovation versus risk mitigation?), it’s pretty applicable to any size or type of company.

The point is this process is agile, but it also collects enough data, both qualitative customer feedback and quantitative analytics, to be able to come up with better test ideas and better prioritize them so you can drive traffic to your online store.

Ton Wesseling, Online Dialogue

The first question we always answer when we want to optimize a customer journey is: Where does this product or service fit on the ROAR model we created at Online Dialogue? Are you still in the risk phase, where we could do lots of research but can’t validate our findings through A/B test online experiments (below 1,000 conversions per month), or are you in the optimization phase? Or even above?

Risk phase: Lots of research, which will be translated into anything from a business model pivot to a whole new design and value proposition.
Optimization phase: Large experiments that will optimize the value proposition and the business model, as well as small experiments to validate user behavior hypotheses, which will build up knowledge for larger design changes.
Automation: You still have experimentation power (visitors) left, meaning your full test potential is not needed to validate your user journey. What’s left should be used to exploit, to grow faster now (without focus on long-term learnings). This could be automated by running bandits/using algorithms.
Re-think: You stop adding lots of research, unless it’s a pivot to something new.

So web or app A/B testing is only a big thing in the optimization phase of ROAR and beyond (until re-think).

Our approach to running experiments is the FACT & ACT model:

The research we do is based on our 5V Model:

We gather all these insights to come up with a main research-backed hypothesis, which will lead to sub-hypotheses that will be prioritized based on the data gathered through either desktop or mobile A/B testing. The higher the chance of the hypothesis being true, the higher it will be ranked.

Once we learn if our hypothesis is true or false, we can start combining learnings and take bigger steps by redesigning/realigning larger parts of the customer journey. However, at some point, all winning implementations will lead to a local maximum. Then you need to take a bigger step to be able to reach a potential global maximum.

And, of course, the main learnings will be spread throughout the company, which leads to all sorts of broader optimization and innovation based on your validated first-party insights.

Are you marketing to an international audience? Learn how to make that process easy with pseudo-localization.

Julia Starostenko, Pinterest

The purpose of an experiment is to validate that making changes to an existing webpage will have a positive impact on the business.

Before getting started, it’s important to determine if running an experiment is truly necessary. Consider the following scenario: There is a button with an extremely low click rate. It would be near impossible to decrease the performance of this button. Validating the effectiveness of a proposed change to the button (i.e., running an experiment) is therefore not necessary.

Similarly, if the proposed change to the button is small, it probably isn’t worth spending the time setting up, executing, and tearing down an experiment. In this case, the changes should just be rolled out to everyone and performance of the button can be monitored.

If it is determined that running an experiment would in fact be beneficial, the next step is to define the business metrics that should be improved (e.g., increase the conversion rate of a button). Then we ensure that proper data collection is in place.

Once this is complete, the audience is randomly run, split testing between two groups: one group is shown the existing version of the button while the other group gets the new version. The conversion rate of each audience is monitored, and once statistical significance is reached, the results of the experiment are determined.

Peep Laja, CXL

A/B testing is a part of a bigger conversion optimization picture. In my opinion, it’s 80% about the research and only 20% about testing. Conversion research will help you determine what to test to begin with.

My process typically looks like this (a simplified summary):

Conduct conversion research using a framework like ResearchXL to identify issues on your site.
Pick a high priority issue (one that affects a large portion of users and is a severe issue), and brainstorm as many solutions to this problem as you can. Inform your ideation process with your conversion research insights. Determine which device you want to run the test on (you need to run mobile A/B testing separate from desktop).
Determine how many variations you can test (based on your traffic/transaction level), and then pick your best one to two ideas for a solution to test against control.
Wireframe the exact treatments (write the copy, make the design changes, etc). Depending on the scope of changes, you might also need to include a designer to design new elements.
Have your front-end developer implement the treatments in your testing tool. Set up necessary integrations (Google Analytics) and set appropriate goals.
Conduct QA on the test (broken tests are by far the biggest A/B testing killer) to make sure it works with every browser/device combo.
Launch the test!
Once the test is done, conduct post-test analysis.
Depending on the outcome, either implement the winner, iterate on the treatments, or go and test something else.

Common mistakes in A/B testing

Testing too many variables simultaneously

When you compare two variables at once, you might not be able to determine which change caused the effect.

Say you want to optimize a landing page. Rather than just testing a headline, you test:

Call-to-action text
CTA button color
Header images
Headlines

Conversion rates go up, but you can’t pinpoint what change was responsible. If you test one variable at a time, you could isolate the impact of each change and get more accurate results.

💡Consideration: Multivariate testing is an option if you want to understanding how multiple variables interact with each other. But to run a multivariate test, you need more traffic and an already well-optimized page to make incremental improvements on. The process is much more complex than running an A/B test.

Insufficient sample size

The reliability of your A/B test results depends on the sample size used. Small samples can cause false positives and negatives, making it difficult to conclude if the differences are the result of your changes or random chance.

Imagine you are testing two versions of a product page to see which one leads to higher purchase rates. You split the traffic but only end up with 100 visitors to Version A and 100 visitors to Version B.

If Version A has a 6% conversion rate, and Version B has a 5% conversion rate, you may think Version A is better. But, with only 100 visitors per version, it’s not statistically significant. It’s possible that if you tested with more visitors, the results might have been different.

The best way to determine a healthy sample size is with a sample size calculator.

Short testing durations

Run your A/B test for at least one, ideally two, full business cycles. Don’t stop your test just because you’ve reached significance. You’ll also need to meet your predetermined sample size. Finally, don’t forget to run all tests in full-week increments.

Why two full business cycles? For starters, two cycles helps you account for:

“I need to think about it” buyers.
Different traffic sources (Facebook, email newsletter, organic search, etc.)
Anomalies. For example, your Friday email newsletter.

Two business cycles are generally enough time to get valuable insight into the user behavior of your target audience.

If you’ve used any sort of A/B test landing page testing tool, you’re likely familiar with the little green “Statistically Significant” icon.

For many, unfortunately, that’s the universal sign for “the test is cooked, call it.” As you’ll learn below, just because A/B test statistical significance has been reached, that doesn’t mean you should stop the test.

Overlooking user segmentation

If you don’t consider different user segments, you’ll get generalized results that may not apply to everyone.

It’s helpful to segment users by demographics, behavior, or other relevant factors. What works for new users might not work for returning users. If you don’t segment, you’ll alienate key user groups and jeopardize the integrity of your test.

Optimize A/B testing for your business

You have the process, you have the power! So, get out there, get the best A/B testing software, and start testing your store. Before you know it, those insights will add up to more money in the bank.

If you want to continue learning about optimization, consider taking a free course, such as Udacity’s A/B testing by Google. You can learn more about web and mobile app A/B testing to boost your optimization skill set.

[embedded content]

A/B testing FAQ

What is A/B testing?

At the most basic level, A/B testing is testing two versions of something to see which performs better. You can A/B test a variety of things related to your business, including social media posts, content, email, and product pages.

What’s an example of A/B testing?

An example of A/B testing would be running paid traffic to two slightly different product pages to see which page has the highest conversion rate. To ensure your A/B tests can provide valuable insight, it’s recommended that you have traffic of more than 5,000 visitors to a given page.

Why do people use A/B testing?

A/B testing lets people test two versions of a webpage, app, or marketing campaign by showing different versions to different segments of users simultaneously. It helps them determine which version gets more conversions, engagement, or sales.

What is an example of A/B testing on social media?

An example of A/B on social media could be testing Instagram ad effectiveness. For example, you’d make two versions of an ad, each with different media, and then analyze which version gets more click-throughs and sales.

If Shopify is of interest and you'd like more information, please do make contact or take a look in more detail here.

Credit: Original article published here.