How to Chunk Documents for RAG (With Examples)
Written by
Jason McDonald
Published
Jan 12, 2026
Reading time
5 min read

Chunking is the most overlooked step in building RAG systems. Most teams spend weeks on prompt engineering while using default chunking settings—then wonder why their AI chatbot gives inconsistent answers.
Chunk size and strategy directly impact retrieval quality. Get it wrong, and your LLM receives incomplete context. Get it right, and answers become noticeably more accurate. For the complete RAG architecture guide, see our RAG for Business Guide.
Why Chunking Matters
When you embed a document for RAG, you don't embed the entire document. You split it into chunks, embed each chunk separately, and retrieve the most relevant chunks at query time.
The problem:
| Chunk Size | Retrieval Issue | Answer Quality |
|---|---|---|
| Too small (50 tokens) | Missing context | Incomplete answers |
| Too large (2000 tokens) | Diluted relevance | Irrelevant content mixed in |
| Wrong boundaries | Split mid-concept | Incoherent retrieval |
The goal: Chunks that are:
- Self-contained — Make sense without surrounding text
- Focused — Cover one concept or topic
- Right-sized — 200-500 tokens for most use cases
Chunking Strategies
1. Fixed-Size Chunking (Simple)
Split text into equal-sized chunks with overlap.
def fixed_size_chunk(text, chunk_size=400, overlap=50):
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunks.append(text[start:end])
start = end - overlap
return chunks
When to use:
- Quick prototyping
- Uniform content (logs, transcripts)
- When structure is unknown
Drawbacks:
- Splits mid-sentence
- Ignores document structure
- Overlap can create redundancy
2. Semantic Chunking (Better)
Split at natural boundaries: sentences, paragraphs, sections.
import re
def semantic_chunk(text, max_tokens=400):
# Split by paragraph first
paragraphs = text.split('\n\n')
chunks = []
current_chunk = ""
for para in paragraphs:
if len(current_chunk) + len(para) < max_tokens:
current_chunk += para + "\n\n"
else:
if current_chunk:
chunks.append(current_chunk.strip())
current_chunk = para + "\n\n"
if current_chunk:
chunks.append(current_chunk.strip())
return chunks
When to use:
- Documentation with clear structure
- Articles and guides
- Knowledge base content
Advantages:
- Respects content boundaries
- Better semantic coherence
- More meaningful retrievals
3. Recursive Chunking (Production-Grade)
Hierarchically split: document -> section -> paragraph -> sentence.
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=400,
chunk_overlap=50,
separators=["\n## ", "\n### ", "\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_text(document)
How it works:
- Try to split at
##(H2 headers) - If chunks still too large, split at
###(H3) - Continue down to paragraphs, sentences, words
When to use:
- Production RAG systems
- Mixed content types
- When chunk quality matters
4. Document-Aware Chunking (Best for Structured Content)
Parse document structure explicitly.
import markdown
from bs4 import BeautifulSoup
def document_aware_chunk(markdown_text):
html = markdown.markdown(markdown_text)
soup = BeautifulSoup(html, 'html.parser')
chunks = []
current_section = {"title": "", "content": ""}
for element in soup.children:
if element.name in ['h1', 'h2', 'h3']:
if current_section["content"]:
chunks.append(current_section)
current_section = {
"title": element.get_text(),
"content": ""
}
else:
current_section["content"] += element.get_text() + "\n"
if current_section["content"]:
chunks.append(current_section)
return chunks
When to use:
- Documentation with consistent structure
- Help articles
- Technical guides
Advantages:
- Section titles preserved as metadata
- Perfect boundary detection
- Enables title-based filtering
Chunk Size Guidelines
| Content Type | Recommended Size | Reasoning |
|---|---|---|
| FAQ answers | 100-200 tokens | Self-contained answers |
| Documentation | 300-500 tokens | Complete concepts |
| Technical guides | 400-600 tokens | Code + explanation |
| Transcripts | 200-300 tokens | Conversational units |
| Legal/compliance | 500-800 tokens | Full clause context |
Rule of thumb: Start with 400 tokens, adjust based on retrieval quality testing.
Overlap: When and How Much
Overlap ensures concepts spanning chunk boundaries aren't lost.
Recommended overlap: 10-20% of chunk size
| Chunk Size | Overlap |
|---|---|
| 200 tokens | 20-40 tokens |
| 400 tokens | 40-80 tokens |
| 600 tokens | 60-120 tokens |
When to skip overlap:
- Document-aware chunking (boundaries are meaningful)
- FAQ-style content (questions are independent)
- Very short source documents
Metadata: The Chunking Superpower
Store metadata with each chunk for filtering and context:
chunk = {
"text": "The actual chunk content...",
"metadata": {
"source": "billing-faq.md",
"section": "Refund Policy",
"category": "billing",
"updated_at": "2026-01-10",
"customer_tier": "all" # or "enterprise" for tier-specific docs
}
}
At query time, filter by metadata:
# Only retrieve billing-related chunks
results = vector_db.query(
embedding=query_embedding,
filter={"category": "billing"},
top_k=5
)
This dramatically improves retrieval precision for support chatbots with diverse knowledge bases.
Testing Your Chunking Strategy
Method 1: Manual Inspection
- Chunk 10-20 representative documents
- Read random chunks
- Ask: "Does this make sense without context?"
- Adjust boundaries and size
Method 2: Retrieval Testing
- Create 20-30 test queries
- Run retrieval against your chunked corpus
- Score: Did the right chunk appear in top 3?
- Target: 80%+ retrieval accuracy
Method 3: End-to-End Testing
- Ask your RAG system real user questions
- Evaluate answer quality (correct, complete, relevant)
- When answers are wrong, trace back to chunking
- Iterate on chunk strategy
Common Chunking Mistakes
1. One-Size-Fits-All
Different content types need different chunking. FAQ answers shouldn't use the same 500-token chunks as technical documentation.
2. Ignoring Headers
Section headers are critical context. When chunking strips "## Refund Policy" from a chunk about refunds, retrieval suffers.
3. Over-Chunking
Tiny chunks (50-100 tokens) retrieve well but provide insufficient context for the LLM. You end up retrieving 20 chunks to get complete information.
4. Under-Testing
Most teams set chunking once and never revisit. As your knowledge base grows, optimal settings change.
The Bottom Line
Chunking isn't glamorous, but it's foundational. A well-chunked corpus with basic RAG often outperforms a poorly-chunked corpus with sophisticated retrieval.
Start with recursive chunking at 400 tokens, add metadata, and test with real queries. Refine from there based on what you observe.
Frequently Asked Questions
What is the best chunk size for RAG?
The best chunk size for RAG is typically 300-500 tokens for most documentation. Smaller chunks (100-200) work better for FAQ-style content. Larger chunks (500-800) suit legal or compliance documents where full context is critical.
How much overlap should chunks have?
Chunks should overlap by 10-20% of their size. For 400-token chunks, use 40-80 tokens of overlap. Skip overlap when using document-aware chunking with meaningful boundaries like section headers.
Should I chunk code differently than text?
Yes, code should be chunked differently. Keep functions and classes together when possible. Include docstrings with their associated code. Use larger chunks for code (500-800 tokens) to preserve context.
Get the Complete Guide
Download this resource as a beautifully formatted PDF for offline reading, sharing with your team, or future reference.
Never miss an update
Get technical insights on revenue operations, cold email infrastructure, and AI-powered support delivered to your inbox.
No spam, ever. Unsubscribe anytime.


