WEBHARMONIX
LLM

RAG in Produktion: die unspektakulären Teile

Chunking, Aktualität, Quellenangaben und Fehlerbilder, die niemand in Demos zeigt.

Von Team Syntheon

Retrieval-Augmented Generation demos are magical: ask a question, get a grounded answer with citations. Production RAG is less glamorous. Here's what we've learned shipping it.

Chunking is everything

Your chunking strategy determines your retrieval quality. Too small and you lose context. Too large and you dilute relevance.

python
# Semantic chunking based on sentence boundaries from langchain.text_splitter import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter( chunk_size=512, chunk_overlap=64, separators=["\n\n", "\n", ". ", " ", ""] )

The overlap sweet spot

Overlap prevents splitting key information across chunks. We've found 10-15% overlap works best, enough to bridge context without creating duplicate retrieval.

Freshness: the silent killer

Your vector index is stale the moment you build it. Without a refresh strategy, your RAG system will confidently cite outdated information. We recommend:

  1. Incremental indexing, only re-embed changed documents
  2. Timestamp filtering, exclude chunks older than your freshness window
  3. Canary queries, automated tests that verify current information is retrievable

Citations users can actually verify

A citation that says "Source: document_42.pdf" is useless. Good citations include a snippet of the relevant text and a way to view the original.

If a user can't verify the source in under 5 seconds, they'll stop trusting the system entirely.