Vicco LabsVicco Labs
Building a production conversational assistant · Part 4
Five layers - three for the agent, two for the regulator

Agent memory: episodic, semantic, and procedural

Confusing the three types of memory is where banking LLM projects fail structurally. Cognitive psychology already had the right taxonomy; it just had to be translated to infrastructure.

16 MAR 2026·6 min read·Memory / LLM Agents / Investment Assistant / Compliance
MEMORY

When you build a conversational assistant for e-commerce or product catalog, the persistence question is technical: latency, cost, TTL.

When you do the same for a bank or asset manager, the question changes nature. It becomes simultaneously technical, architectural, and regulatory - and the wrong answer shows up not in a benchmark, but in an audit.

I started by treating everything as one thing, but it isn't. With practical experience I learned that a financial assistant in production needs at least five distinct layers, each with its own purpose, technology, and lifecycle. And before talking layers, you need the taxonomy.

The taxonomy

Cognitive psychology distinguishes three types of human memory. When I started mapping what the assistant needed to "remember", I realized the same classification applies - and that confusing the three types is where banking LLM projects fail structurally.

Mapping to real infrastructure, each type has its own layer, lifecycle, and - in banking - regulatory impact:

Episodic memory: the diary of every conversation

It's the memory of sequential events: what happened, in what order, with what result. For the assistant, it's the message history, the executed tool calls, the graph state checkpoints.

I use LangGraph's Redis Checkpointer as the short-term backend:

The Redis choice here isn't arbitrary. For short-term episodic memory, what matters is read/write speed - the graph queries state at every node. In-memory Redis delivers sub-millisecond latency, without PostgreSQL's disk I/O overhead.

What this memory holds in practice:

Lifecycle: purge after 30-60 days. Only active sessions need to live here. Closed sessions migrate to the other layers via Kafka events.

The analogy is what you remember from what you did today. Detailed, sequential, but it doesn't stay in your head forever.

Semantic memory: what the agent knows about the customer

It's the memory of facts and relationships: who this customer is, what they've shown they prefer, what their previous vs current risk profile is, what products they've consulted. Not the event itself - the knowledge distilled from events.

The critical difference for the investment domain: semantic memory carries suitability. The agent doesn't start a session about funds from scratch - it already knows this customer has a Moderate profile (the check is also done on-time to detect profile changes), has consulted FIDC before, and the last time they saw a return above 15% p.a. they showed interest but didn't invest.

This isn't just UX. It's compliance: Brazilian CVM Resolution 30 requires suitability to be verified before any product presentation (and if the customer doesn't want a profile analysis, treating them as "Conservative" is permitted). If semantic memory is correctly populated, the agent doesn't need to ask the risk profile every session - the shelf tool filters based on it before generating any response.

The analogy is what you know about a friend of years (not what they said yesterday, but who they are).

Procedural memory: how the agent acts

It's the memory of skills and rules: not what happened nor what the agent knows, but how it should behave. For an investment assistant, this includes the DSPy-optimized prompts, the regulatory validation patterns, the compiled routing criteria.

Procedural memory is the only one of the three that doesn't change during execution. It's loaded at startup and stays stable through the process lifecycle. When DSPy is recompiled with new examples or when a CVM resolution changes a validation pattern, you update procedural memory via deploy, not via runtime.

The analogy is riding a bike. You don't think - your body just knows. The agent doesn't deliberate over each compliance rule; the critique_node simply executes.

How the three types connect in practice

Without semantic memory, the agent starts from zero every session.

Without updated procedural memory, critique_node uses outdated rules.

Without episodic memory, the agent can't resolve anaphora. Questions like "this product" have no referent.

The two layers that serve the regulator, not the agent

The three memories above serve the agent at runtime. The next two layers exist for another purpose, and are equally mandatory.

Analytics answers questions no operational memory handles well: "Which customers made more than 3 redemptions above R$ 50,000 in the last 45 days?", "Every case where the guardrail blocked a response for CVM 30 violation in the last quarter."

The legal archive is the vault. Once written, no one modifies. Not the senior DBA. Not the AWS account root user (if you configure it in Compliance Mode on S3 Object Lock, not in Governance Mode). The difference matters: in Governance Mode an admin can overwrite with special permission. In Compliance Mode the retention period is inviolable. It's the only configuration that satisfies a Brazilian Central Bank audit.

The execucao_ferramentas_subagentes table in analytics is especially critical: the parametros_payload field stores the exact arguments the LLM injected into the tool call. When the customer complains that "the assistant invested R$ 10,000 but I said R$ 1,000", that field is the irrefutable proof that the model didn't hallucinate (or did, and you need to know that before the regulator does).

Always think of the worst case

Storing well is necessary. But not sufficient.

Even with the five layers in the right place, the system still has an unsolved problem: the LLM can generate a response that violates CVM or the ANBIMA Code before anything is recorded. The legal archive stores the evidence of what was delivered to the customer, but if what was delivered was already a violation, storing it doesn't help.

That's why critique_node exists: the graph's compliance officer. And that's what I'll talk about next week: the four regulatory violations the LLM commits naturally, how the Circuit Breaker protects when the semantic guardrail fails, and the paradox between LGPD's right to erasure and the Central Bank's storage obligation.