What is Statistical Significance?
Statistical significance is a measure of whether the difference in performance between two test variations is likely real or simply the result of random chance. In A/B testing, a result is considered statistically significant when the probability of observing the measured difference by chance alone (the p-value) falls below a predetermined threshold — typically 5%.
In practical terms, when a test reaches 95% statistical significance, it means there is less than a 5% probability that the observed difference would have occurred if the two variations actually performed identically.
How Statistical Significance works
Every A/B test produces observed conversion rates for each variation. But these observed rates contain noise — if you split traffic between two identical pages, random variation would still produce slightly different conversion rates. Statistical significance quantifies whether the observed difference is large enough, given the sample size, to be confident it reflects a real underlying difference.
The key inputs are:
- Observed conversion rates for each variation
- Sample size in each variation
- Significance level (alpha) — the threshold below which you reject the null hypothesis (typically 0.05, or 5%)
If the calculated p-value is less than alpha, the result is statistically significant. If not, the test is inconclusive — which does not mean the variations are equal, only that you do not have enough evidence to declare a winner.
Why it matters for eCommerce and SaaS
Without statistical significance, A/B test results are unreliable. A variation might appear to convert 20% better after 500 visitors, but that difference could easily vanish — or reverse — once 5,000 visitors have been tested. Shipping changes based on insignificant results leads to a pattern of random wins and losses that erodes trust in the optimization process.
For eCommerce businesses, the stakes are high. A false positive on a checkout page test might reduce conversion rate for weeks before the team realizes the “winning” variation was actually noise. For SaaS businesses, a false positive on a pricing page can suppress signups during a critical growth period.
Conversely, requiring statistical significance prevents teams from shipping changes that merely look good in the short term. It enforces patience and rigor, which are the hallmarks of effective CRO programs.
Common mistakes
- Calling tests early — Checking significance before reaching the planned sample size inflates false positive rates dramatically. A test might cross the 95% threshold on day 2 and then drop below it by day 7. Always run to the predetermined sample size.
- Confusing significance with importance — A result can be statistically significant but commercially trivial. A 0.1% relative lift might reach significance with a very large sample, but if it translates to $50/month in revenue, it is not worth implementing. Pair significance testing with MDE planning.
- Ignoring confidence intervals — Significance tells you whether a difference exists; the confidence interval tells you how large it is. Always report both.
- Testing too many metrics — If you test 20 metrics at the 95% level, one of them will likely appear significant by chance. Use a primary metric and apply corrections for any secondary metrics.
Industry standard
The CRO industry has standardized on a 95% confidence level (alpha = 0.05) and 80% power (beta = 0.20). These thresholds balance the risk of false positives against practical test durations. Some teams use 90% confidence for exploratory tests and 99% for high-stakes changes like pricing.
How acceleroi approaches it
At acceleroi, we set significance thresholds and sample sizes during test planning, before any data is collected. We commit to running tests to completion and never declare a winner prematurely. Every test report includes the p-value, confidence interval, observed power, and the revenue impact at the observed effect size — giving clients the full picture needed to make an informed decision about whether to ship the change.
Related resources
- Get a free CRO audit to evaluate whether your current tests are properly sized for significance
- Read our blog for guides on interpreting A/B test results