Product Strategy
Eligibility Beats Ranking: Fix Your Catalog Before You Fix Search
In e-commerce, discovery failure often isn’t an algorithm problem — it’s a catalog problem. If product identity, attributes, and taxonomy are inconsistent, you don’t just rank lower. You become ineligible across filters, feeds, and AI-driven discovery surfaces. Treat catalog integrity like a growth product with owners, contracts, and measurable coverage.
Key insight
Discovery is an eligibility game: inconsistent product data doesn’t just reduce ranking — it removes you from the candidate set across search, filters, feeds, and emerging AI shopping surfaces.
Key takeaways
- Discovery is an eligibility game as much as a ranking game.
- Small catalog inconsistencies fragment filters, dilute signals, and create dead ends.
- Treat identity, attributes, taxonomy, and freshness as explicit contracts with owners.
- Measure discovery integrity (coverage, drift, dead-ends, zero-results), not just conversion.
If “128 GB” and “128GB” are two different values in your catalog, your discovery stack is already broken.
Not because customers notice the whitespace.
Because machines do.
Filters split into duplicates. Search recall silently drops. Feeds become inconsistent. And the shopper hits dead ends that feel like “your site doesn’t have what I want.”
That’s why I’ve started thinking about product discovery as an *eligibility* problem, not a ranking problem.
If your product data is inconsistent, you don’t always rank lower. Sometimes you simply don’t make it into the candidate set.
Why Discovery Fails Upstream
Most discovery conversations get pulled into the visible layer:
- “Let’s improve the search algorithm.”
- “Let’s rewrite category copy.”
- “Let’s tune relevance.”
Those matter — but they assume the input is coherent.
Discovery systems (on-site search, faceted navigation, Google Shopping, marketplaces, and now AI-assisted shopping surfaces) are only as good as what they can *understand and compare*.
And that understanding starts upstream in three places:
1) **Identity** (what the product is, and how variants relate) 2) **Attributes** (what’s true about it, consistently formatted) 3) **Taxonomy** (where it belongs, and what metadata it inherits)
If any of those drift, discovery becomes noisy even when the front-end looks fine.
The Non-Obvious Mechanism: Eligibility Beats Rank
The ranking mental model suggests a smooth curve:
Bad data → slightly worse rank → slightly less traffic.
But retail discovery often behaves like a hard gate:
Bad data → not eligible → not shown.
Three examples that show up repeatedly in real catalogs:
- **Attribute fragmentation:** the same attribute appears as two values (“128GB” vs “128 GB”, “Blue” vs “Navy”, “1 L” vs “1000ml”). The system now treats one human truth as multiple machine truths. Filters duplicate. Counts become misleading. Shoppers lose confidence.
- **Variant drift:** sizes/colors/pack counts exist as separate products instead of variants under one parent. This inflates the catalog, dilutes signals (reviews, sales, clicks), and creates confusing search results where the “same” product appears multiple times.
- **Category mismatch:** a product lives in the wrong branch, inherits the wrong attributes, and becomes invisible for the queries where it should compete.
Once you see discovery as an eligibility pipeline, it becomes clear why “search tuning” sometimes does nothing:
You are tuning the ordering of a set that is missing the right items.
A Simple Framework: The Catalog Contract
I like thinking of product data as a contract between four parties:
- **Merchandising** (what we sell and how we group it)
- **Product** (how shoppers discover, compare, and choose)
- **Operations** (availability, shipping, returns, and service constraints)
- **Analytics** (what we measure, learn, and improve)
A good contract is explicit, testable, and owned.
Here’s a practical version you can implement without a platform rewrite:
### 1) Identity Contract (what a “product” means)
- Define what is a parent product vs a variant.
- Enforce one canonical naming convention across site, feeds, and internal tools.
- Decide which attributes are variant-defining (size, color) vs descriptive (material, compatibility).
### 2) Attribute Contract (how values are represented)
- Standardize units and formatting (“1000 ml” vs “1 L”, whitespace, casing).
- Enforce allowed-value lists for high-impact attributes.
- Build synonym rules where human language varies but meaning is stable.
### 3) Taxonomy Contract (where products live)
- One primary category per product family.
- Prevent duplicate paths that scatter signals.
- Keep category definitions aligned to shopper intent, not only internal org charts.
### 4) Freshness Contract (what must be correct *today*)
- Price, availability, delivery promises, and key constraints must be reliable.
- Suppress or deprioritize facets that lead to dead ends (e.g., a filter value that returns only out-of-stock items).
This is not busywork. It is the unglamorous part of growth that compounds.
What To Measure (So “Data Quality” Isn’t Vague)
If you can’t measure it, it becomes a recurring debate instead of a fix.
A minimal “discovery integrity” scorecard can include:
- **Attribute coverage:** % of SKUs with required attributes populated (by category).
- **Normalization drift:** count of distinct values for “should-be-standard” fields (units, memory sizes, pack counts).
- **Filter dead-ends:** filter values that frequently lead to zero results or out-of-stock results.
- **Search zero-results rate:** and the top queries that fail (paired with missing attributes/taxonomy issues).
- **Variant fragmentation rate:** how often variants are split across multiple parents.
The point is to convert “catalog quality” into something operational, owned, and improvable.
Why This Matters More In 2026
Product discovery is happening in more places than your website:
- shopping modules that rely on structured feeds
- marketplaces that enforce attribute requirements
- social search where credibility and clarity shape what gets trusted
- emerging AI shopping interfaces that compare products using structured signals
In these surfaces, being “eligible” is often a bigger lever than being “ranked.”
So the strategic question becomes:
Are we building a catalog that machines can confidently understand — and shoppers can confidently navigate?
My Take
If you lead growth, discovery, SEO, or analytics, it’s tempting to chase the visible levers: campaigns, content, bidding, ranking tweaks.
But a surprisingly large share of “growth leaks” are created by:
- inconsistent product identity,
- messy attribute values,
- and a taxonomy that doesn’t match intent.
Those leaks don’t show up as one big error. They show up as thousands of small misses.
And that’s why they’re worth taking seriously.
Key Takeaways
- Discovery is an eligibility game as much as a ranking game.
- Small catalog inconsistencies fragment filters, dilute signals, and create dead ends.
- Treat identity, attributes, taxonomy, and freshness as explicit contracts with owners.
- Measure discovery integrity (coverage, drift, dead-ends, zero-results), not just conversion.
- Fix one high-volume category end-to-end, then scale the contract.
What’s the single most painful field in your catalog to keep consistent (and what breaks when it drifts)?