ICE Scoring Framework for CRO Prioritization

ICE Scoring Framework: How to Prioritize CRO Test Ideas Effectively

You have 50 test ideas and can only run 3 per month. How do you pick the right ones? The ICE scoring framework is the most popular method — but most teams use it wrong. This guide shows you how to use it correctly, when to use alternatives, and how to build a prioritization system that actually works.

What Is ICE Scoring?

ICE stands for:

Impact — How much will this improve conversion/revenue if it works?
Confidence — How sure are we that it will work?
Ease — How easy is it to implement and test?

Each factor is scored 1—10, and the ICE score = I x C x E.

How to Score Each Factor

Impact (1—10)

Estimate the revenue effect if the test wins.

Score	Impact Level	Example
1—2	Marginal	Button color change, minor copy tweak
3—4	Moderate	New social proof section, improved product descriptions
5—6	Significant	Restructured checkout flow, new pricing presentation
7—8	High	Express checkout integration, personalized homepage
9—10	Transformative	Complete funnel redesign, new business model element

Tips for scoring Impact:

Consider the page’s traffic volume (a small improvement on a high-traffic page has more impact than a large improvement on a low-traffic page)
Calculate potential revenue: (Estimated CVR lift x Monthly Visitors x AOV)
Score based on revenue impact, not just CVR impact

Confidence (1—10)

How confident are you that this change will produce the predicted improvement?

Score	Evidence Level	Example
1—2	Gut feeling / opinion	”I think users would like a video here”
3—4	Industry best practice	”Articles say exit-intent popups increase signups”
5—6	Competitor or case study evidence	”Competitor X added this and reported 20% lift”
7—8	Your own qualitative data	”Session recordings show users struggling with this exact element”
9—10	Your own quantitative data	”Analytics show 60% drop-off at this step; user surveys confirm the reason”

Tips for scoring Confidence:

Require evidence, not opinions. Ask: “What data supports this?”
Multiple data points increase confidence
Past test results on similar changes increase confidence

Ease (1—10)

How easy is it to build, launch, and measure this test?

Score	Effort Level	Example
1—2	Weeks of dev + design work	Complete checkout rebuild, new payment integration
3—4	Days of dev + design work	New page layout, complex A/B test setup
5—6	1—2 days of work	New section design, multi-element test
7—8	A few hours	Copy change, CTA button test, image swap
9—10	Minutes (visual editor change)	Headline test, button color, badge addition

ICE Scoring Example

Test Idea	Impact	Confidence	Ease	ICE Score
Add express checkout (Shop Pay, Apple Pay)	9	9	6	486
Redesign product page with benefit-first copy	7	7	5	245
Add free shipping progress bar to cart	7	8	8	448
Change CTA button from blue to green	1	2	10	20
Add exit-intent popup with 10% discount	6	7	8	336
Complete checkout flow redesign	9	6	2	108

Priority order: Express checkout (486) —> Shipping bar (448) —> Exit popup (336) —> Product page (245) —> Checkout redesign (108) —> Button color (20)

Common ICE Scoring Mistakes

1. Everyone scores differently

Problem: Your 7 is someone else’s 4. Without calibration, scores are meaningless. Fix: Use the scoring rubrics above. Have the team score independently, then discuss and calibrate.

2. High-Ease bias

Problem: Easy tests always bubble to the top, even if impact is low. Fix: Set a minimum Impact threshold (5 or higher) before a test enters the backlog. Easy + low-impact = waste of a test slot.

3. Confidence without evidence

Problem: Teams rate confidence based on how much they personally like the idea. Fix: Require at least one data source for any Confidence score above 5.

4. Not updating scores

Problem: Ideas scored 6 months ago based on old data. Fix: Re-score quarterly as new data becomes available.

ICE vs Alternative Frameworks

Framework	Factors	Best For	Weakness
ICE	Impact, Confidence, Ease	Quick scoring, small teams	Subjective, no behavioral science grounding
PIE	Potential, Importance, Ease	Page-level prioritization	”Importance” is vague
PXL	Binary criteria checklist	Reducing subjectivity	Complex, requires training
AXR	Assumption, eXpected impact, Resource cost	Behavioral science-driven CRO	Requires heuristic analysis expertise
RICE	Reach, Impact, Confidence, Effort	Product teams	”Reach” adds complexity

The AXR Framework (acceleroi’s Approach)

AXR improves on ICE by grounding confidence in behavioral science:

Assumption Strength — Is the hypothesis backed by a recognized behavioral principle (cognitive bias, heuristic, or UX pattern)? Scored based on evidence strength.
EXpected Impact — Revenue impact estimated from page traffic x predicted CVR lift x AOV
Resource Cost — Implementation effort (inverted: easy = high score)

The key difference: AXR’s “Assumption Strength” requires citing specific behavioral science principles, not just confidence feelings. This produces more reliable prioritization.

Building Your Test Backlog

Step 1: Gather all test ideas from research

Pull ideas from: analytics findings, heatmap insights, session recording observations, user survey feedback, competitor analysis, and team brainstorming.

Step 2: Score each idea with ICE (or AXR)

Have 2—3 team members score independently, then average.

Step 3: Set thresholds

Test immediately: Score greater than 300
Test next quarter: Score 150—300
Backlog / revisit: Score 50—150
Don’t test: Score less than 50

Step 4: Plan your testing calendar

Based on your testing velocity (tests per month), slot the top-scoring ideas into your calendar.

Step 5: Review and re-score monthly

As you learn from test results, update scores on remaining ideas.

Skip the manual scoring. Our AI audit engine automatically scores every recommendation using the AXR framework — grounded in 40+ behavioral science heuristics and calibrated against 1,000+ historical A/B test results.

Conversion

Retention & Growth

Acquisition & Data