Tutorials

How to Chunk Documents for RAG (With Examples)

J

Written by

Jason McDonald

Published

Jan 12, 2026

Reading time

5 min read

Updated: May 06, 2026
How to Chunk Documents for RAG (With Examples)

Chunking is the most overlooked step in building RAG systems. Most teams spend weeks on prompt engineering while using default chunking settings—then wonder why their AI chatbot gives inconsistent answers.

Chunk size and strategy directly impact retrieval quality. Get it wrong, and your LLM receives incomplete context. Get it right, and answers become noticeably more accurate. For the complete RAG architecture guide, see our RAG for Business Guide.

Why Chunking Matters

When you embed a document for RAG, you don't embed the entire document. You split it into chunks, embed each chunk separately, and retrieve the most relevant chunks at query time.

The problem:

Chunk Size Retrieval Issue Answer Quality
Too small (50 tokens) Missing context Incomplete answers
Too large (2000 tokens) Diluted relevance Irrelevant content mixed in
Wrong boundaries Split mid-concept Incoherent retrieval

The goal: Chunks that are:

  • Self-contained — Make sense without surrounding text
  • Focused — Cover one concept or topic
  • Right-sized — 200-500 tokens for most use cases

Chunking Strategies

1. Fixed-Size Chunking (Simple)

Split text into equal-sized chunks with overlap.

def fixed_size_chunk(text, chunk_size=400, overlap=50):
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start = end - overlap
    return chunks

When to use:

  • Quick prototyping
  • Uniform content (logs, transcripts)
  • When structure is unknown

Drawbacks:

  • Splits mid-sentence
  • Ignores document structure
  • Overlap can create redundancy

2. Semantic Chunking (Better)

Split at natural boundaries: sentences, paragraphs, sections.

import re

def semantic_chunk(text, max_tokens=400):
    # Split by paragraph first
    paragraphs = text.split('\n\n')

    chunks = []
    current_chunk = ""

    for para in paragraphs:
        if len(current_chunk) + len(para) < max_tokens:
            current_chunk += para + "\n\n"
        else:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = para + "\n\n"

    if current_chunk:
        chunks.append(current_chunk.strip())

    return chunks

When to use:

  • Documentation with clear structure
  • Articles and guides
  • Knowledge base content

Advantages:

  • Respects content boundaries
  • Better semantic coherence
  • More meaningful retrievals

3. Recursive Chunking (Production-Grade)

Hierarchically split: document -> section -> paragraph -> sentence.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=50,
    separators=["\n## ", "\n### ", "\n\n", "\n", ". ", " ", ""]
)

chunks = splitter.split_text(document)

How it works:

  1. Try to split at ## (H2 headers)
  2. If chunks still too large, split at ### (H3)
  3. Continue down to paragraphs, sentences, words

When to use:

  • Production RAG systems
  • Mixed content types
  • When chunk quality matters

4. Document-Aware Chunking (Best for Structured Content)

Parse document structure explicitly.

import markdown
from bs4 import BeautifulSoup

def document_aware_chunk(markdown_text):
    html = markdown.markdown(markdown_text)
    soup = BeautifulSoup(html, 'html.parser')

    chunks = []
    current_section = {"title": "", "content": ""}

    for element in soup.children:
        if element.name in ['h1', 'h2', 'h3']:
            if current_section["content"]:
                chunks.append(current_section)
            current_section = {
                "title": element.get_text(),
                "content": ""
            }
        else:
            current_section["content"] += element.get_text() + "\n"

    if current_section["content"]:
        chunks.append(current_section)

    return chunks

When to use:

  • Documentation with consistent structure
  • Help articles
  • Technical guides

Advantages:

  • Section titles preserved as metadata
  • Perfect boundary detection
  • Enables title-based filtering

Chunk Size Guidelines

Content Type Recommended Size Reasoning
FAQ answers 100-200 tokens Self-contained answers
Documentation 300-500 tokens Complete concepts
Technical guides 400-600 tokens Code + explanation
Transcripts 200-300 tokens Conversational units
Legal/compliance 500-800 tokens Full clause context

Rule of thumb: Start with 400 tokens, adjust based on retrieval quality testing.

Overlap: When and How Much

Overlap ensures concepts spanning chunk boundaries aren't lost.

Recommended overlap: 10-20% of chunk size

Chunk Size Overlap
200 tokens 20-40 tokens
400 tokens 40-80 tokens
600 tokens 60-120 tokens

When to skip overlap:

  • Document-aware chunking (boundaries are meaningful)
  • FAQ-style content (questions are independent)
  • Very short source documents

Metadata: The Chunking Superpower

Store metadata with each chunk for filtering and context:

chunk = {
    "text": "The actual chunk content...",
    "metadata": {
        "source": "billing-faq.md",
        "section": "Refund Policy",
        "category": "billing",
        "updated_at": "2026-01-10",
        "customer_tier": "all"  # or "enterprise" for tier-specific docs
    }
}

At query time, filter by metadata:

# Only retrieve billing-related chunks
results = vector_db.query(
    embedding=query_embedding,
    filter={"category": "billing"},
    top_k=5
)

This dramatically improves retrieval precision for support chatbots with diverse knowledge bases.

Testing Your Chunking Strategy

Method 1: Manual Inspection

  1. Chunk 10-20 representative documents
  2. Read random chunks
  3. Ask: "Does this make sense without context?"
  4. Adjust boundaries and size

Method 2: Retrieval Testing

  1. Create 20-30 test queries
  2. Run retrieval against your chunked corpus
  3. Score: Did the right chunk appear in top 3?
  4. Target: 80%+ retrieval accuracy

Method 3: End-to-End Testing

  1. Ask your RAG system real user questions
  2. Evaluate answer quality (correct, complete, relevant)
  3. When answers are wrong, trace back to chunking
  4. Iterate on chunk strategy

Common Chunking Mistakes

1. One-Size-Fits-All

Different content types need different chunking. FAQ answers shouldn't use the same 500-token chunks as technical documentation.

2. Ignoring Headers

Section headers are critical context. When chunking strips "## Refund Policy" from a chunk about refunds, retrieval suffers.

3. Over-Chunking

Tiny chunks (50-100 tokens) retrieve well but provide insufficient context for the LLM. You end up retrieving 20 chunks to get complete information.

4. Under-Testing

Most teams set chunking once and never revisit. As your knowledge base grows, optimal settings change.

The Bottom Line

Chunking isn't glamorous, but it's foundational. A well-chunked corpus with basic RAG often outperforms a poorly-chunked corpus with sophisticated retrieval.

Start with recursive chunking at 400 tokens, add metadata, and test with real queries. Refine from there based on what you observe.

Frequently Asked Questions

What is the best chunk size for RAG?

The best chunk size for RAG is typically 300-500 tokens for most documentation. Smaller chunks (100-200) work better for FAQ-style content. Larger chunks (500-800) suit legal or compliance documents where full context is critical.

How much overlap should chunks have?

Chunks should overlap by 10-20% of their size. For 400-token chunks, use 40-80 tokens of overlap. Skip overlap when using document-aware chunking with meaningful boundaries like section headers.

Should I chunk code differently than text?

Yes, code should be chunked differently. Keep functions and classes together when possible. Include docstrings with their associated code. Use larger chunks for code (500-800 tokens) to preserve context.

Get the Complete Guide

Download this resource as a beautifully formatted PDF for offline reading, sharing with your team, or future reference.

Share:

Never miss an update

Get technical insights on revenue operations, cold email infrastructure, and AI-powered support delivered to your inbox.

No spam, ever. Unsubscribe anytime.

Related Articles