RAG in Produktion: die unspektakulären Teile, Syntheon

Retrieval-Augmented Generation demos are magical: ask a question, get a grounded answer with citations. Production RAG is less glamorous. Here's what we've learned shipping it.

Chunking is everything

Your chunking strategy determines your retrieval quality. Too small and you lose context. Too large and you dilute relevance.

python
# Semantic chunking based on sentence boundaries
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=["\n\n", "\n", ". ", " ", ""]
)

The overlap sweet spot

Overlap prevents splitting key information across chunks. We've found 10-15% overlap works best, enough to bridge context without creating duplicate retrieval.

Freshness: the silent killer

Your vector index is stale the moment you build it. Without a refresh strategy, your RAG system will confidently cite outdated information. We recommend:

Incremental indexing, only re-embed changed documents
Timestamp filtering, exclude chunks older than your freshness window
Canary queries, automated tests that verify current information is retrievable

Citations users can actually verify

A citation that says "Source: document_42.pdf" is useless. Good citations include a snippet of the relevant text and a way to view the original.

If a user can't verify the source in under 5 seconds, they'll stop trusting the system entirely.