Research

Blended RAG: Improving RAG Accuracy with Semantic Search and Hybrid Query-Based Retrievers

AIntropy Research Team  ยท  Published April 2024  ยท  12 min read
๐Ÿ“„ Read on arxiv ๐Ÿ† NJ Open Data Leaderboard ๐Ÿงช ChemRAG Leaderboard

Summary

The problem with RAG at scale

Retrieval-Augmented Generation is the standard approach for connecting Large Language Models to private data. Feed documents into a vector store, retrieve relevant chunks at query time, pass them to the LLM for generation.

It works well on small corpora. But as the number of documents scales into the millions, across diverse formats (PDFs, spreadsheets, JSON, images, video), traditional single-retriever approaches degrade. The retriever becomes the bottleneck, not the model.

Most RAG accuracy problems are retrieval problems. If the right document is not in the context window, the best model in the world cannot generate the right answer.

Our approach: Blended Retrievers

Instead of relying on a single retrieval method, we blend multiple semantic search techniques. Dense Vector indexes (kNN) excel at semantic similarity. Sparse Encoder indexes (SERM) capture exact lexical matches. BM25 handles keyword-heavy queries. Each has strengths the others lack.

We combine them with hybrid query strategies that dynamically weight each retriever based on the query type. A factual lookup query triggers different retrieval weights than a conceptual reasoning query.

Results

Key benchmark results across standard IR and QA datasets:

Dataset Best Method Top-5 Accuracy Top-10 Accuracy
SQuAD KNN + Best Field 94.89% 97.43%
NQ (Natural Questions) SERM + Best Field 88.22% 88.77%
TREC-COVID (Score2) SERM + Best Field 94% 98%
PubMedQA (MetaGen) Hybrid boosted + enriched 82.1% -

The SQuAD result is particularly notable: 94.89% retrieval accuracy surpasses even fine-tuned models, achieved entirely through retrieval strategy without any model modification. This means the same approach works on any private dataset without training.

MetaGen: Metadata enrichment

We extended Blended RAG with MetaGen, a metadata enrichment pipeline that uses LLMs to generate comprehensive metadata for each document. This enriched metadata improves indexing precision and retrieval recall across PubMedQA, NQ, and SQuAD datasets.

The key insight: better metadata means better retrieval means better answers. The model quality matters far less than the quality of what you put in front of it.

From benchmarks to production

These results power Kurious in production. When we search 85 million NJ government records in 0.2 seconds, or find the exact moment in 4,000 hours of legal video, the Blended RAG architecture is what makes it possible.

The leaderboards are live and open. Verify the results yourself:

๐Ÿ† NJ Open Data Leaderboard ๐Ÿงช ChemRAG Leaderboard ๐Ÿ“„ Full paper on arxiv

Underlying technology may be protected by one or more patents pending under USPTO.

© 2026 AIntropy AI. All rights reserved.