Retrieval-Augmented Generation is the standard approach for connecting Large Language Models to private data. Feed documents into a vector store, retrieve relevant chunks at query time, pass them to the LLM for generation.
It works well on small corpora. But as the number of documents scales into the millions, across diverse formats (PDFs, spreadsheets, JSON, images, video), traditional single-retriever approaches degrade. The retriever becomes the bottleneck, not the model.
Most RAG accuracy problems are retrieval problems. If the right document is not in the context window, the best model in the world cannot generate the right answer.
Instead of relying on a single retrieval method, we blend multiple semantic search techniques. Dense Vector indexes (kNN) excel at semantic similarity. Sparse Encoder indexes (SERM) capture exact lexical matches. BM25 handles keyword-heavy queries. Each has strengths the others lack.
We combine them with hybrid query strategies that dynamically weight each retriever based on the query type. A factual lookup query triggers different retrieval weights than a conceptual reasoning query.
Key benchmark results across standard IR and QA datasets:
| Dataset | Best Method | Top-5 Accuracy | Top-10 Accuracy |
|---|---|---|---|
| SQuAD | KNN + Best Field | 94.89% | 97.43% |
| NQ (Natural Questions) | SERM + Best Field | 88.22% | 88.77% |
| TREC-COVID (Score2) | SERM + Best Field | 94% | 98% |
| PubMedQA (MetaGen) | Hybrid boosted + enriched | 82.1% | - |
The SQuAD result is particularly notable: 94.89% retrieval accuracy surpasses even fine-tuned models, achieved entirely through retrieval strategy without any model modification. This means the same approach works on any private dataset without training.
We extended Blended RAG with MetaGen, a metadata enrichment pipeline that uses LLMs to generate comprehensive metadata for each document. This enriched metadata improves indexing precision and retrieval recall across PubMedQA, NQ, and SQuAD datasets.
The key insight: better metadata means better retrieval means better answers. The model quality matters far less than the quality of what you put in front of it.
These results power Kurious in production. When we search 85 million NJ government records in 0.2 seconds, or find the exact moment in 4,000 hours of legal video, the Blended RAG architecture is what makes it possible.
The leaderboards are live and open. Verify the results yourself:
Underlying technology may be protected by one or more patents pending under USPTO.
© 2026 AIntropy AI. All rights reserved.