Product

Your AI Has Never Seen Your Data

85 million government records. 23 agencies. 8 formats. Every frontier model you know sees none of it. Here is why — and what we built.

AIntropy · March 2026 · 8 min read
TL;DR
New Jersey's open data portal holds 85 million records across 23 agencies in 8+ formats. GPT, Gemini, and Claude cannot answer a single question that crosses agency lines — not because they are bad models, but because this data was never in their training set and never will be. We built Kurious to fix this. It answers in seconds.

The data landscape no one talks about

New Jersey has one of the most comprehensive open data portals in the United States. Over 85 million records spanning pensions, corrections, public utilities, education, healthcare, transportation, and more. Maintained by 23 government agencies. Stored across 789 data sources in spreadsheets, PDFs, JSON files, images, ZIPs, and more.

It is all publicly available. It has always been publicly available. And yet, if you ask any frontier AI model a question that requires connecting two of those agencies — say, comparing pension liability data to education budget allocations — you will get one of three responses: a confident hallucination, an admission of ignorance, or a redirection to "please consult official sources."

The data was always there. The AI was just blind to it.

23 Isolated Agency Silos
Health Education Transport Treasury Justice Corrections Environment Labor Agriculture Human Svcs Public Safety Veterans Banking Children Community Insurance Economic Dev Each cluster is an isolated agency. The dots have never connected.

This is not a New Jersey problem. It is the default state of institutional data everywhere. Governments, hospitals, enterprises — all of them sitting on massive, fragmented data landscapes that no AI system has ever been able to navigate. Not because the data is bad. Because the systems were built in isolation, and AI was never trained to bridge them.

"The data was never the problem. Perception was."

Why frontier models cannot help

When people say "frontier AI," they mean the best models available: GPT-4o, Gemini Ultra, Claude 3.5. These are remarkable systems. They can write code, summarize legal documents, reason through complex problems, and carry on nuanced conversations. But they share a fundamental constraint that never gets talked about.

They were trained on the internet. And your data — your organization's data, your government's data — was not on it.

NJ Open Data contains 23,000+ PDFs packed with embedded images, tables, charts, and scanned documents. It contains spreadsheets with inconsistent schemas across agencies. It contains ZIP archives containing files in formats that overlap but do not align. No frontier model was trained on any of this. And even if you tried to pass this data to them through retrieval-augmented generation, traditional RAG systems fail on exactly the scenarios where the answer requires understanding multiple formats across multiple sources simultaneously.

Frontier LLMs
PDF ✕
CSV ✕
JSON ✕
XLSX ✕
JPEG ✕
DOC ✕
❓ Incomplete and inaccurate
With Kurious
PDF ✓ CSV ✓ JSON ✓ XLSX ✓ JPEG ✓ DOC ✓
✓ Complete answer in seconds
The data was always there. Kurious just connects it all.

We tested this ourselves. Take a question like: "Which New Jersey townships have the most employees with terminal leave benefits?" Answering it correctly requires crossing HR filings, payroll records, and benefit disclosures across agencies that have never been designed to talk to each other. We ran this through GPT-4o, Gemini Ultra, and Claude 3.5. Every one of them failed — not with a clear error, but with a confident, plausible, incorrect answer.

Introducing Kurious

We built Kurious to solve exactly this problem. Not as a wrapper around an existing model. Not as a prompt-engineering trick. As a fundamentally different approach to how AI reads and reasons over heterogeneous, real-world data.

🧠
Natively multimodal
PDF, CSV, JSON, JPEG, XLSX, video. All formats, one engine. Nothing gets left behind.
🔗
Cross-silo reasoning
Connects pension data to education budgets to policy documents in a single query.
Answers in seconds
No training. No fine-tuning. Connect your data and go live in days, not months.

What it looks like in practice

We deployed Kurious on NJ Open Data. The entire portal — 85 million records, 789 data sources, 23 agencies, 8+ formats — indexed and queryable in plain language.

The thinking steps are visible. The sources are cited. And the answer is correct — verified against ground truth across multiple agency datasets. No hallucination. No hedging. No "please consult official sources."

"0 frontier LLMs can answer a question that crosses NJ agency silos. Kurious can — in under 10 seconds."

The numbers

85M+ Records indexed
789 Data sources
23K+ PDFs with embedded images
23 Government agencies
8+ File formats
0 Frontier LLMs that can handle it

Why this matters beyond New Jersey

New Jersey is not a special case. It is a representative case. Every state government, every hospital network, every enterprise with more than a decade of data history is sitting on the same problem: massive, fragmented, heterogeneous data that no AI system has ever been able to navigate.

The opportunity is not in building a better chatbot. It is in making the data that already exists — that organizations have spent decades and billions collecting — finally useful to the AI systems being deployed on top of it.

That is what Kurious does. And NJ Open Data is just the first proof point.

Try it yourself

Ask anything about NJ Open Data. Live and ungated.

Open Kurious →
QR code
Or scan to open on your phone
njopendata.aintropy.ai
🤖
AIntropy
The hippocampus of your private AI. Building systems that make institutional data finally visible to AI.
aintropy.ai →