Blended RAG - AIntropy Research

The problem with RAG at scale

Retrieval-Augmented Generation is the standard approach for connecting Large Language Models to private data. Feed documents into a vector store, retrieve relevant chunks at query time, pass them to the LLM for generation.

It works well on small corpora. But as the number of documents scales into the millions, across diverse formats (PDFs, spreadsheets, JSON, images, video), traditional single-retriever approaches degrade. The retriever becomes the bottleneck, not the model.

Most RAG accuracy problems are retrieval problems. If the right document is not in the context window, the best model in the world cannot generate the right answer.

Our approach: Blended Retrievers

Instead of relying on a single retrieval method, we blend multiple semantic search techniques. Dense Vector indexes (kNN) excel at semantic similarity. Sparse Encoder indexes (SERM) capture exact lexical matches. BM25 handles keyword-heavy queries. Each has strengths the others lack.

We combine them with hybrid query strategies that dynamically weight each retriever based on the query type. A factual lookup query triggers different retrieval weights than a conceptual reasoning query.

Results

Key benchmark results across standard IR and QA datasets:

Dataset	Best Method	Top-5 Accuracy	Top-10 Accuracy
SQuAD	KNN + Best Field	94.89%	97.43%
NQ (Natural Questions)	SERM + Best Field	88.22%	88.77%
TREC-COVID (Score2)	SERM + Best Field	94%	98%
PubMedQA (MetaGen)	Hybrid boosted + enriched	82.1%	-

The SQuAD result is particularly notable: 94.89% retrieval accuracy surpasses even fine-tuned models, achieved entirely through retrieval strategy without any model modification. This means the same approach works on any private dataset without training.

MetaGen: Metadata enrichment

We extended Blended RAG with MetaGen, a metadata enrichment pipeline that uses LLMs to generate comprehensive metadata for each document. This enriched metadata improves indexing precision and retrieval recall across PubMedQA, NQ, and SQuAD datasets.

The key insight: better metadata means better retrieval means better answers. The model quality matters far less than the quality of what you put in front of it.

From benchmarks to production

These results power Kurious in production. When we search 85 million NJ government records in 0.2 seconds, or find the exact moment in 4,000 hours of legal video, the Blended RAG architecture is what makes it possible.

The leaderboards are live and open. Verify the results yourself:

🏆 NJ Open Data Leaderboard 🧪 ChemRAG Leaderboard 📄 Full paper on arxiv

Blended RAG: Improving RAG Accuracy with Semantic Search and Hybrid Query-Based Retrievers

Summary

The problem with RAG at scale

Our approach: Blended Retrievers

Results

MetaGen: Metadata enrichment

From benchmarks to production