Product names
The product name should be short, clear, and describe the product itself, not its variants. Do not include color, size, or other variable features in the name. Avoid repeated words, unnecessary descriptors, special characters, and separators. Each product should have one logical name, regardless of the number of variants.Cotton T-Shirt
Cotton T-Shirt - Black / M | PROMO!!!
Product descriptions
The description should be unique, detailed, and written in natural language. Use several full sentences that explain the product’s use, features, and context. Do not copy descriptions between products or rely only on keyword lists. The description is the best place for synonyms and language your customers use.Soft cotton T-shirt with a classic fit. Great for everyday wear and easy to layer under a jacket. Breathable fabric with reinforced seams for better durability.
T-shirt, cotton, men, black, M, L, XL, best quality, cheap
Product attributes
Attributes should come only from dedicated attribute fields, not from names or descriptions. Use consistent attribute names and value formats across your catalog. Do not include marketing text or long descriptions in attributes—use only specific, clear values.- color: black
- material: 100% cotton
- fit: regular
- color: black, perfect gift, bestseller, shipping 24h
- material: cotton!!! top quality!!!
Products variants
Differences such as color, size, or other options should be handled as variants, not as separate products with different names. The product name should be the same for all variants. If variants are indexed separately, shared data (description, category, product type) must be identical.- Product: Cotton T-Shirt
- Variants: color: black, size: S / M / L / XL (same name; shared fields like description/category are identical)
Separate products: Cotton T-Shirt Black M, Cotton T-Shirt White L
Categories
Assign each product a logical and consistent category. Categories should be hierarchical and used consistently throughout the store. Do not use categories as tags or to describe product features. Semantic duplicates and different names for the same category reduce search quality.Apparel > T-Shirts > Men
Awesome T-Shirts, Black, Gym wear, Bestsellers (tags/features, not categories)
Language consistency
Product data should use a single language within one index. Do not mix languages in names, descriptions, or attributes. Place synonyms and alternative names in descriptions or dedicated fields, not in the product name.Do not mix languages and styles of expression.
Data normalization
Before indexing, data should be cleaned and standardized. This includes letter case, extra spaces, typos, and value formatting. The same information must appear identically for all products so the search engine can interpret it correctly.- color: black (always the same spelling/case)
- size: 42 (consistent format)
- color: Black, black , BLACK, blk
- size: 42, 42 EU, EU42
Uniqueness and completeness
Each product should have a unique identifier and a complete set of key data, such as name, description, and category. Empty fields, placeholder values, or technical placeholders lower search quality.- id: 834719
- name: City Backpack
- description: …
- category: Accessories > Backpacks
- price_usd: 79.99
- id: null or id: test123
- description: TBD / - / empty
- missing category
- price: $0 as a placeholder
Numeric and structured data
Prices, weights, dimensions, and other numeric values should be stored in numeric fields. They may include additional text or indexes, but this must be standardized for all products. Dates should use a single, consistent format.- price_usd: 199.99
- weight_kg: 0.45
- length_cm: 30, width_cm: 20, height_cm: 10
- release_date: 2026-02-14
- price: “$199.99 incl. tax” (text instead of numeric field)
- weight: “0.45 kg / lightweight”
- dimensions: “30x20x10cm approx.” (inconsistent/free text)
- dates mixed as 14/02/26, 2026.02.14, Feb 14th 2026
Data approach
The search engine uses only the provided data and does not interpret context. The more organized, predictable, and semantically clean the product data, the better the search, ranking, and filtering results.The search engine uses exactly the data you provide. Keep names stable, store options as variants, and keep attributes consistent so search, ranking, and filters work reliably.
Stuffing everything into the name/description (“BLACK M SALE HIT!!!”) and mixing marketing text into attributes makes the data unpredictable and hurts search quality.
TL;DR – preparing product data for search
- Product names should be short, clear, and not include variants (color, size, etc.).
- Descriptions must be unique, longer, and written in natural language—not just keyword lists.
- Keep attributes only in attribute fields, with one consistent name and value format.
- Handle variants as variants, not as separate products with different names.
- Categories should be logical, consistent, and used throughout the store.
- Use a single data language—do not mix languages in names, descriptions, or attributes.
- Clean data: no typos, repetitions, unnecessary characters, or placeholders.
- Each product must have all key information and a unique ID.
- Numeric values (prices, dimensions, weights) should be numbers; they may include words, but this must be standardized for all products.
- The simpler and more organized the data, the better the search and filtering results.
TL;DR Correct data example
TL;DR Incorrect data example
