Vicco LabsVicco Labs
Building a production conversational assistant · Part 10
28 special characters, booleans that don't exist

QueryBuilder: turning a Pydantic object into a safe FT.SEARCH query

Building FT.SEARCH queries manually is where you discover RediSearch silently interprets '&' as AND — no error, no exception.

20 APR 2026·4 min read·FT.SEARCH / Pydantic / Redis Stack / QueryBuilder
FT.SEARCH

In the Redis Stack posts I showed the Fat/Slim/Hybrid model and how the modeling decision directly affects what the LLM receives. Today I want to show the layer between the DSPy RouterOutput and Redis: the QueryBuilder.

It's the component that turns a Pydantic object with semantic filters into a valid, safe, correct FT.SEARCH string. Sounds trivial, isn't.

The problem with building FT.SEARCH queries manually

The first version of the search code built the query as an f-string:

Three immediate problems:

  • Special characters break the parser silently: RediSearch has 28 special characters in TAG fields that need backslash escaping. A product name like "Nike Air (Plus)" without escaping becomes @name:{Nike Air (Plus)}, which RediSearch parses as a malformed alternative group. Result: zero results, no error.
  • Booleans don't exist in RediSearch: there is no native boolean field type. Fields like free_shipping and in_stock have to be stored and queried as TAG with string "true"/"false". If upstream sends True, 1, "yes", or "TRUE", the query needs to normalize first.
  • LLM input isn't trustworthy: DSPy returns text_search="Nike Air Max 270". That has to become a tokenized fuzzy query, not a literal substitution — otherwise you either match too exactly (zero results for misspellings) or too loosely (irrelevant results).

The architecture: Pydantic model → QueryBuilder → FT.SEARCH string

The Shelf Filter Model is the contract between router and builder. Each field has an explicit Python type and maps to the corresponding Redis field:

extra="forbid" ensures unexpected fields from the LLM don't pass silently — they raise ValidationError immediately.

TAG escaping: 28 special characters

RediSearch has a list of special characters that need escaping in TAG field values. Most internet examples only escape the obvious ones (-, .). In practice, product names contain (, ), +, &, /, and others.

The bug: "Nike & Co." without escaping becomes @brand:{Nike & Co.}. RediSearch interprets "&" as the AND operator inside the TAG group and parses it as @brand:{Nike} AND @brand:{Co.}, which obviously returns zero results, with no error message, no exception.

Booleans as TAG: the normalization upstream doesn't do for you

It's simple: the problem is normalization before reaching here. Catalog APIs return True, 1, "yes", "TRUE", for the same field. Pydantic with a validator solves it:

Without that validator, a field free_shipping="TRUE" would pass Pydantic as a string and reach _format_tag_boolean as a truthy value, generating @free_shipping:{True} (capital T), which doesn't match the index storing "true" (lowercase), returning zero results, no error.

Fuzzy matching with tokenization rules

The product_name field is the most delicate one. DSPy returns free text. You don't want exact match (fails for any spelling variation) nor pure wildcard (returns everything).

The solution is to tokenize and apply rules by token size:

The result for "Nike Air Max 270":

That matches "Nike Air Max 270 Black", "Nike Air Max 270 React" and tolerates "Nikke" (Levenshtein 1). It doesn't match "adidas Air Max" because %%Nike%% won't match "adidas".

Sanitization: RediSearch injection is real

The LLM can (and eventually will) return text containing RediSearch commands. text_search="sneakers LIMIT 0 1000 SORTBY price" doesn't raise an exception — it just executes.

The fallback is "*", a wildcard search that returns all documents in the index. It's fail-open: you deliver a generic response instead of executing an injected command. In contexts where fail-closed would be more appropriate, the function can raise an exception instead of returning "*".

Template cache: don't rebuild the same query twice

For structured filters without text_search (e.g. {category: "Tenis", gender: "Masculino", free_shipping: True}), the dynamic part of the query is always the same for the same filter set. The QueryBuilder caches:

product_name stays out of the cache — it's the most variable field and the only one that goes through fuzzy tokenization. The structured filters (TAG and NUMERIC) are stable for the same set of values.

What the QueryBuilder delivers to Redis Stack

For a query with {category: "Tenis", gender: "Masculino", price_max: 300, free_shipping: True, product_name: "Nike Air Max"}:

Which Redis executes as:

The LLM receives 5 fields × N products, not 60 fields nor an 800-token JSON.

Next week: the complete pipeline. The path of a query from the user's message to the formatted response (supervisor, router, anaphora, DSPy, tool, generate, critique) all together in a synthesis view.