Skip to content

Why A/B Testing Your Emails Is (Mostly) Useless — And What to Test Instead

 

Let’s start with something that will probably irritate a few people:

Most email A/B testing is not optimisation, it's a load of RUBBISH

Not because testing is bad, testing is essential in marketing (but also marketing is crazy and will you ever get the same results twice?!)

But because what we call “A/B testing” in email, particularly subject line testing, or a one variant test it rarely meets the standard of what testing is supposed to be.

Testing, in its true form, is scientific (I always wanted to be a scientist, actually!). It requires control, repeatability, validation, and isolation of variables, and the harsh reality is that the email inbox is one of the least controllable environments in digital marketing.

Which means the majority of subject line A/B tests you’re running are giving you data — but not truth.

And there is a very important difference between the two.

 

Before you dig in, why don't you get access to RE:markable

 RE:markable is the weekly email about emails. Dropping the latest email marketing news, updates, insights, free resources, upcoming masterclasses, webinars, and of course, a little inbox mischief. 

 

 

The scientific problem with subject line A/B testing

In any proper experiment, you isolate one variable and hold everything else constant. If you want to test whether Variable A is better than Variable B, you must:

  • Keep the same participants
  • Keep the same environment
  • Keep the same timing
  • Keep the same external conditions
  • Repeat the experiment under identical circumstances

Now apply that to email.

When you send two subject lines to a sample of your list and declare a winner based on open rate, you are not controlling:

  • What other emails were in that inbox at the time

  • How many similar campaigns were sent that day

  • The emotional state of the recipient

  • Whether they were in a meeting, commuting, stressed, distracted

  • What email they opened just before yours

  • Whether they’ve subconsciously decided your brand is ignorable

You are not testing in isolation, you are testing inside a shared ecosystem shaped by every other marketer in the world and that matters more than most people realise.

 

You do not send an email in isolation

Let’s take something very real: Mother’s Day opt-out campaigns (did you see LinkedIn my post?!)

What started as a thoughtful, well-meaning gesture - “If you’d rather not receive Mother’s Day emails, you can opt out” - quickly became an industry-wide trend.

Now, instead of one considerate email, consumers receive 20 or 30 near-identical messages.

Same subject lines, same structure and same emotional (I thought really it was performative) framing.

When that happens, performance is no longer about wording. It’s about saturation.

Email does not have a social media algorithm filtering for which version of a trend someone sees. The inbox is shared. Every version lands. And this is the critical flaw in subject line A/B testing.

Your performance is influenced not just by your wording, but by:

  • How many similar emails arrived that day

  • How fatigued the subscriber is

  • How you’ve trained them to perceive your brand

  • What sits above and below you in their inbox

You are competing with context, not just creativity.

 

If you’re landing in inbox, they are seeing you

If you do not have a deliverability issue (you are not landing in spam), people see your emails.

Even if they never open them. The human brain is extremely efficient at pattern recognition. When someone scrolls through their inbox, they register:

  • Your brand name

  • Your tone

  • Your cadence

  • Your predictability

  • Your subject line, preheader - WHAT YOU SAID

They form subconscious shortcuts.

“This brand always discounts.”

 “This brand always sells.”

 “This brand sends useful stuff.”

 “I don’t need to open this.”

That decision often happens before your subject line is fully processed. This is predictive coding in action. The brain anticipates based on past experience.

Which means when you A/B test subject lines, you are often testing against a perception that has already been formed.

And that perception has very little to do with whether you added an emoji.

 

The open rate illusion

The deeper problem is the metric itself (we can be friends if you also hate open rates). Subject line A/B testing usually optimises for open rate.

But open rate is not a success metric.

People open emails to:

  • Delete them

  • Unsubscribe

  • Quickly scan and move on

  • Confirm something irrelevant

  • Satisfy curiosity

An open does not equal attention, it does not equal persuasion, iIt does not equal revenue, and it certainly does not equal impact.

So when Subject Line A generates a 2% higher open rate than Subject Line B, what exactly have you learned?

That more people opened. Not why (the most important part), not whether it moved them closer to action, not whether it improved long-term brand perception.

You’ve optimised the top of a funnel without validating the outcome at the bottom.

That is not strategic testing but it is cosmetic testing! 

 

The validation problem

To truly validate a subject line test, you would need to recreate identical conditions.

  • Same people
  • Same day of the week
  • Same time
  • Same competing emails
  • Same emotional state
  • Same prior exposure

That is impossible!! 

You cannot re-run the same Tuesday at 8:00am in the same inbox environment, you cannot reset someone’s memory of your previous five emails.

Which means most subject line tests are single-instance observations, not validated experiments.

They tell you what happened in one moment, they do not tell you what works systematically.

 

So what should you test instead?

Testing is not the problem, testing the wrong things is.

If you want email to actually drive revenue, pipeline, retention, or behavioural change, you need to shift from micro testing to strategic testing.

Here’s what that looks like:

 

1. Test strategy, not wording

Instead of testing: “Free shipping” vs “Don’t miss out” 

Test: Intent-based messaging vs calendar-based messaging.

For example:

  • Behaviour-triggered emails vs weekly broadcast campaigns

  • Pricing-page visit follow-ups vs generic nurture

  • Abandoned basket flows vs resend-to-non-openers

Let this run for months.

Measure:

  • Revenue per subscriber

  • Conversion rate over time

  • Assisted conversions

  • Lead-to-opportunity rate

 

2. Test segmentation logic

Rather than tweaking subject lines, test your segmentation approach.

Does behaviour-based segmentation outperform static personas?

For example:

  • Highly engaged product viewers vs entire list

  • Repeat blog readers vs broad nurture

  • Category-specific past purchasers vs generic promotion

Measure:

  • Cumulative revenue over 3–6 months

  • Reduced unsubscribe rates

  • Increased lifetime value

This type of testing changes outcomes.

 

3. Test timing based on intent

This is where it becomes powerful! I LOVE intent based email marketing. 

Instead of testing whether “Last chance” performs better than “Ends soon,” test whether sending an email at the moment of behavioural signal outperforms sending one on a fixed calendar schedule.

For B2C:

  • Does sending within 30 minutes of abandoned basket outperform a 24-hour delay?

  • Does excluding recent purchasers from campaigns increase retention?


For B2B:

  • Does triggering outreach after repeated pricing page visits increase booked calls?

  • Does sending objection-handling content during evaluation phase increase conversion?

Testing is just answering your questions like that. 

 

4. Test message positioning over time

Run longer-term experiments comparing:

  • Problem-led messaging vs feature-led messaging

  • Objection-handling sequences vs discount-led sequences

  • Educational onboarding vs aggressive upsell

Let these tests run across quarters.

Evaluate:

  • Retention rates

  • Repeat purchase rates

  • Sales cycle length

  • Pipeline quality

 

5. Test ecosystem changes

The most underrated tests are ecosystem tests.

What happens if you:

  • Remove resend-to-non-openers?

  • Reduce cadence by 20%?

  • Add stronger suppression rules?

  • Prioritise transactional over promotional messaging?

Does fatigue decrease?

 Does long-term engagement stabilise?

 Does complaint rate drop?

That is system-level optimisation.

 

The real difference

Vanity testing asks: “What wording gets more opens today?”

Strategic testing asks: “What structure, timing, and alignment drive measurable business outcomes over time?”

One makes you feel busy, and the other makes you effective.

 

And we've reached the end...

Most subject line A/B tests give you a dopamine hit (so i can't blame you, especially when the one you knew would win, wins) 

They create the illusion of optimisation, they generate slides for reporting and they make small movements in unreliable metrics.

But they rarely change revenue, they rarely change retention, they rarely change perception.

If you want email to actually work, stop obsessing over the surface.

Start testing:

  • Intent alignment
  • Segmentation quality
  • Timing logic
  • Objection handling
  • Suppression rules

  • Ecosystem design

Because email is not a slot machine.

It’s a relationship channel! 

 

 

Like this blog? You'll love RE:markable

 RE:markable is the weekly email about emails. Dropping the latest email marketing news, updates, insights, free resources, upcoming masterclasses, webinars, and of course, a little inbox mischief. 

 

Email, CRM and HubSpot Support

I help marketers and businesses globally improve, design and fix their email, CRM, and HubSpot ecosystems, from strategy through to execution.

My services include:

  • Email marketing strategy, audits, training, workshops, and consultancy

  • CRM strategy and enablement

  • Full HubSpot implementations, optimisation and onboarding through my agency

If you’re looking for experienced external support (and lots of enjoyment along the way), this is where to start.