Go Deeper
Most AI verification asks one model to check another. The Gauntlet runs four integrated defense layers:
(1) Seven historical-figure personas — each running a different reasoning model from a different provider (Claude Opus 4.6, GPT-5.4, Grok 4.20, Gemini 3.1 Pro, Perplexity Sonar) — run four rounds of cross-examination with live web fact-checking. Different training data, different architectures, different blind spots. When one model hallucinates, the others catch it.
(2) A hypergraph knowledge graph extracts entity relationships with source provenance from 213 authoritative sources across all 50 states, revealing dependency risks and disconnected knowledge clusters no single model surfaces.
(3) Bayesian confidence scoring updates prior/posterior probabilities on every factual claim — claims below threshold get flagged for human review.
(4) A self-evolution engine analyzes failures after every run (tool omissions, split consensus, verdicts without evidence), matches them to corrective skills from a skill bank, and injects those learned patterns into future debates. The system gets smarter with every run.
Every completed analysis produces a cryptographically signed certificate (Ed25519) verifiable via public API endpoint. A Sovereign Edition runs the entire architecture on local open-weight models via Ollama — fully air-gapped, no data leaves the building.
Built by a single non-engineer founder using Claude Code. 110,000+ lines of TypeScript/Python. Three provisional patents filed. Academic validation study underway with Wake Forest University faculty. Live at gauntletscore.com with API access.
Stack Used
Anthropic Claude Opus 4.6, OpenAI GPT-5.4, xAI Grok 4.20, Google Gemini 3.1 Pro, Google Gemini 3.1 Flash (data parsing), Perplexity Sonar Reasoning Pro — all reasoning models. Sovereign Edition: Ollama local inference with open-weight models. TypeScript/Python, Next.js, Supabase (pgvector), Vercel, Railway, Stripe, Ed25519 cryptographic signing. Entire codebase built with Claude Code.