Vicco LabsVicco Labs
Writing

Writing

Field notes, series, and essays. Written first for me - then for you.



Series

Building a production conversational assistant

A technical series on routing, tools, compliance, observability, and everything between the LLM and the database.

FT.SEARCH
Building a production conversational assistant · Part 10FT.SEARCH

QueryBuilder: turning a Pydantic object into a safe FT.SEARCH query

Building FT.SEARCH queries manually is where you discover RediSearch silently interprets '&' as AND — no error, no exception.

20 APR 2026 · 4 min read
DSPY
Building a production conversational assistant · Part 9DSPy

dspy.Refine: runtime self-correction without recompiling the model

DSPy outside offline mode: generate, evaluate against a reward function, and if it fails, try again - before critique_node steps in.

13 APR 2026 · 4 min read
OBSERVABILITY
Building a production conversational assistant · Part 8Observability

Observability in a LangGraph graph: what Langfuse sees that the log doesn't

Logs cover what happened inside each node. They don't answer 'did the fallback rate climb in the last 30 minutes?'. For that, Langfuse.

8 APR 2026 · 4 min read
LLM ROUTING
Building a production conversational assistant · Part 7LLM Routing

Three routers, three different problems: DSPy, custom Semantic Router, and Aurélio AI

Before building the custom one I evaluated an open-source library that almost made it into the project. This is the comparison I wish I'd read before making those decisions.

1 APR 2026 · 10 min read
DSPY
Building a production conversational assistant · Part 6DSPy

DSPy in practice: what changes when the router is already an LLM but isn't yet compilable

The problem DSPy solves isn't the absence of AI in routing. It's the absence of a contract on that AI's output.

26 MAR 2026 · 7 min read
COMPLIANCE
Building a production conversational assistant · Part 5Compliance

Regulatory guardrails in investment assistants: CVM, ANBIMA, and the LGPD paradox

Between the LLM generating a response and it reaching the customer is where a regulatory violation can happen - without intent, without malice, and with no possibility of reversing it.

21 MAR 2026 · 5 min read
MEMORY
Building a production conversational assistant · Part 4Memory

Agent memory: episodic, semantic, and procedural

Confusing the three types of memory is where banking LLM projects fail structurally. Cognitive psychology already had the right taxonomy; it just had to be translated to infrastructure.

16 MAR 2026 · 6 min read
DSPY
Building a production conversational assistant · Part 3DSPy

DSPy, the framework that treats prompts as compilable code, not as strings

Instead of writing prompts, you program declarative modules and let the framework compile optimized prompts - based on data, metrics, and the model you're using.

14 MAR 2026 · 6 min read
REDIS STACK
Building a production conversational assistant · Part 2Redis Stack

Fat vs Slim vs Hybrid in Redis Stack: the model that changed how I think about retrieval for LLM

When volume grows and the LLM starts losing itself in the context, the modeling decision is as important as the database choice. Fat, Slim, or Hybrid - which one stuck?

11 MAR 2026 · 4 min read
REDIS
Building a production conversational assistant · Part 1Redis

Have you used Redis for more than simple caching?

Cache miss became a slow API call, p99 climbed, LLM cost climbed. That's where I discovered Redis Stack as a deterministic retrieval and analytics layer for LLM applications.

4 MAR 2026 · 3 min read