Let’s start with something that will probably irritate a few people:
Most email A/B testing is not optimisation, it's a load of RUBBISH
Not because testing is bad, testing is essential in marketing (but also marketing is crazy and will you ever get the same results twice?!)
But because what we call “A/B testing” in email, particularly subject line testing, or a one variant test it rarely meets the standard of what testing is supposed to be.
Testing, in its true form, is scientific (I always wanted to be a scientist, actually!). It requires control, repeatability, validation, and isolation of variables, and the harsh reality is that the email inbox is one of the least controllable environments in digital marketing.
Which means the majority of subject line A/B tests you’re running are giving you data — but not truth.
And there is a very important difference between the two.
In any proper experiment, you isolate one variable and hold everything else constant. If you want to test whether Variable A is better than Variable B, you must:
Now apply that to email.
When you send two subject lines to a sample of your list and declare a winner based on open rate, you are not controlling:
What other emails were in that inbox at the time
How many similar campaigns were sent that day
The emotional state of the recipient
Whether they were in a meeting, commuting, stressed, distracted
What email they opened just before yours
Whether they’ve subconsciously decided your brand is ignorable
You are not testing in isolation, you are testing inside a shared ecosystem shaped by every other marketer in the world and that matters more than most people realise.
Let’s take something very real: Mother’s Day opt-out campaigns (did you see LinkedIn my post?!)
What started as a thoughtful, well-meaning gesture - “If you’d rather not receive Mother’s Day emails, you can opt out” - quickly became an industry-wide trend.
Now, instead of one considerate email, consumers receive 20 or 30 near-identical messages.
Same subject lines, same structure and same emotional (I thought really it was performative) framing.
When that happens, performance is no longer about wording. It’s about saturation.
Email does not have a social media algorithm filtering for which version of a trend someone sees. The inbox is shared. Every version lands. And this is the critical flaw in subject line A/B testing.
Your performance is influenced not just by your wording, but by:
How many similar emails arrived that day
How fatigued the subscriber is
How you’ve trained them to perceive your brand
What sits above and below you in their inbox
You are competing with context, not just creativity.
If you do not have a deliverability issue (you are not landing in spam), people see your emails.
Even if they never open them. The human brain is extremely efficient at pattern recognition. When someone scrolls through their inbox, they register:
Your brand name
Your tone
Your cadence
Your predictability
They form subconscious shortcuts.
“This brand always discounts.”
“This brand always sells.”
“This brand sends useful stuff.”
“I don’t need to open this.”
That decision often happens before your subject line is fully processed. This is predictive coding in action. The brain anticipates based on past experience.
Which means when you A/B test subject lines, you are often testing against a perception that has already been formed.
And that perception has very little to do with whether you added an emoji.
The deeper problem is the metric itself (we can be friends if you also hate open rates). Subject line A/B testing usually optimises for open rate.
But open rate is not a success metric.
People open emails to:
Delete them
Unsubscribe
Quickly scan and move on
Confirm something irrelevant
Satisfy curiosity
An open does not equal attention, it does not equal persuasion, iIt does not equal revenue, and it certainly does not equal impact.
So when Subject Line A generates a 2% higher open rate than Subject Line B, what exactly have you learned?
That more people opened. Not why (the most important part), not whether it moved them closer to action, not whether it improved long-term brand perception.
You’ve optimised the top of a funnel without validating the outcome at the bottom.
That is not strategic testing but it is cosmetic testing!
To truly validate a subject line test, you would need to recreate identical conditions.
That is impossible!!
You cannot re-run the same Tuesday at 8:00am in the same inbox environment, you cannot reset someone’s memory of your previous five emails.
Which means most subject line tests are single-instance observations, not validated experiments.
They tell you what happened in one moment, they do not tell you what works systematically.
Testing is not the problem, testing the wrong things is.
If you want email to actually drive revenue, pipeline, retention, or behavioural change, you need to shift from micro testing to strategic testing.
Here’s what that looks like:
Instead of testing: “Free shipping” vs “Don’t miss out”
Test: Intent-based messaging vs calendar-based messaging.
For example:
Behaviour-triggered emails vs weekly broadcast campaigns
Pricing-page visit follow-ups vs generic nurture
Abandoned basket flows vs resend-to-non-openers
Let this run for months.
Measure:
Revenue per subscriber
Conversion rate over time
Assisted conversions
Lead-to-opportunity rate
Rather than tweaking subject lines, test your segmentation approach.
Does behaviour-based segmentation outperform static personas?
For example:
Highly engaged product viewers vs entire list
Repeat blog readers vs broad nurture
Category-specific past purchasers vs generic promotion
Measure:
Cumulative revenue over 3–6 months
Reduced unsubscribe rates
Increased lifetime value
This type of testing changes outcomes.
This is where it becomes powerful! I LOVE intent based email marketing.
Instead of testing whether “Last chance” performs better than “Ends soon,” test whether sending an email at the moment of behavioural signal outperforms sending one on a fixed calendar schedule.
For B2C:
Does sending within 30 minutes of abandoned basket outperform a 24-hour delay?
Does excluding recent purchasers from campaigns increase retention?
For B2B:
Does triggering outreach after repeated pricing page visits increase booked calls?
Does sending objection-handling content during evaluation phase increase conversion?
Testing is just answering your questions like that.
Run longer-term experiments comparing:
Problem-led messaging vs feature-led messaging
Objection-handling sequences vs discount-led sequences
Educational onboarding vs aggressive upsell
Let these tests run across quarters.
Evaluate:
Retention rates
Repeat purchase rates
Sales cycle length
Pipeline quality
The most underrated tests are ecosystem tests.
What happens if you:
Remove resend-to-non-openers?
Reduce cadence by 20%?
Add stronger suppression rules?
Prioritise transactional over promotional messaging?
Does fatigue decrease?
Does long-term engagement stabilise?
Does complaint rate drop?
That is system-level optimisation.
Vanity testing asks: “What wording gets more opens today?”
Strategic testing asks: “What structure, timing, and alignment drive measurable business outcomes over time?”
One makes you feel busy, and the other makes you effective.
Most subject line A/B tests give you a dopamine hit (so i can't blame you, especially when the one you knew would win, wins)
They create the illusion of optimisation, they generate slides for reporting and they make small movements in unreliable metrics.
But they rarely change revenue, they rarely change retention, they rarely change perception.
If you want email to actually work, stop obsessing over the surface.
Start testing:
Suppression rules
Because email is not a slot machine.
It’s a relationship channel!