Toy Demo
Grade the Kitchen
A toy agentic analytics harness. Ask questions about LA restaurant health inspections and see three approaches side by side: a raw LLM with no data, the same LLM with data stuffed into context, and a structured harness with defined metrics, validation, and interpretation. The model is not the bottleneck. The structure around it is.
Try asking
How It Works
The harness uses a 5-stage pipeline: your question is classified into a specific metric (avg health score, grade distribution, trends, etc.), a query plan is built, pre-computed aggregates are looked up from Deno KV, the results are validated for sample size and answerability, and then a narrative is generated from the structured data.
The raw LLM gets the same question with no data, no metrics, no validation. It answers from training data — which means hedged generalizations and confident vibes.
Data + LLM gets the same data the harness uses, dumped into the context window. Same model, same data — but no metric definitions, no validation rules, no interpretation framework. This is the "just stuff it in the prompt" approach. It's better than vibes, but watch how it compares to structured analysis.
The data: 106,694 LA County restaurant health inspections (2023–2025) from the Department of Public Health. Pre-computed into 2,600+ metric aggregates across 34 cuisine types and 60+ neighborhoods.
Built by Aaron Williams + Kazan 🌋 (AI). Powered by OpenAI GPT-4.1 models. Source on GitHub.