AGENT MADNESS
THE BRACKETBRACKET ENTRIESBEST OF THE RESTSIGN IN
PUBLIC ENTRY PAGE

The Gauntlet

Seven AI personas powered by reasoning models from five providers (Opus 4.6, GPT-5.4, Grok 4.20, Gemini 3.1 Pro, Perplexity Sonar) run structured adversarial debate, map evidence in a hypergraph knowledge graph, score claims via Bayesian confidence calibration, and self-evolve by learning from their own failures — returning a cryptographically signed trust score with full audit trail.

ROUND 1 DEADLINE
VOTING CLOSES THURSDAY, MARCH 26
The Gauntlet is live in Round 1 right now. Voting closes Thursday, March 26, so if you're backing this project, send people into the matchup before the round locks.
VOTE THIS MATCHUPVIEW ROUND 1
BACK TO BRACKET ENTRIESVIEW LIVE MATCHUPVIEW BRACKETVIEW OPPONENT
The Gauntlet
Builder
Ernie Hobbs
Build Type
Agent Team
Lifecycle
Live product
Consensus Score
84.9
Region
REGION 1
Seed
6
Opponent
FitFoundry
CATEGORIES
ResearchData AnalysisAutomation / Workflow
Go Deeper
Most AI verification asks one model to check another. The Gauntlet runs four integrated defense layers: (1) Seven historical-figure personas — each running a different reasoning model from a different provider (Claude Opus 4.6, GPT-5.4, Grok 4.20, Gemini 3.1 Pro, Perplexity Sonar) — run four rounds of cross-examination with live web fact-checking. Different training data, different architectures, different blind spots. When one model hallucinates, the others catch it. (2) A hypergraph knowledge graph extracts entity relationships with source provenance from 213 authoritative sources across all 50 states, revealing dependency risks and disconnected knowledge clusters no single model surfaces. (3) Bayesian confidence scoring updates prior/posterior probabilities on every factual claim — claims below threshold get flagged for human review. (4) A self-evolution engine analyzes failures after every run (tool omissions, split consensus, verdicts without evidence), matches them to corrective skills from a skill bank, and injects those learned patterns into future debates. The system gets smarter with every run. Every completed analysis produces a cryptographically signed certificate (Ed25519) verifiable via public API endpoint. A Sovereign Edition runs the entire architecture on local open-weight models via Ollama — fully air-gapped, no data leaves the building. Built by a single non-engineer founder using Claude Code. 110,000+ lines of TypeScript/Python. Three provisional patents filed. Academic validation study underway with Wake Forest University faculty. Live at gauntletscore.com with API access.
Stack Used
Anthropic Claude Opus 4.6, OpenAI GPT-5.4, xAI Grok 4.20, Google Gemini 3.1 Pro, Google Gemini 3.1 Flash (data parsing), Perplexity Sonar Reasoning Pro — all reasoning models. Sovereign Edition: Ollama local inference with open-weight models. TypeScript/Python, Next.js, Supabase (pgvector), Vercel, Railway, Stripe, Ed25519 cryptographic signing. Entire codebase built with Claude Code.