AI Benchmarks & Performance · Feb 19, 2026
Gemini 3.1 Pro Hits 77% on ARC-AGI-2: What That Score Actually Means
ARC-AGI-2 is the benchmark designed to be impossible for AI. Gemini 3.1 Pro just scored 77%. Here is what that number actually means — and what it doesn’t.
MC
Maya Chen
AI Researcher & Benchmarks Editor
77.1% ARC-AGI-2
1M token context
~95% human average
<10% previous AI
1M token context
~95% human average
<10% previous AI

Key insight: ARC-AGI-2 was specifically engineered by François Chollet to be resistant to AI memorization and pattern-matching. Humans average 95%. Every major AI lab spent years barely cracking 10% on ARC-AGI-1. Gemini 3.1 Pro just posted 77.1% on the harder successor. That requires explanation.

MC
Maya Chen
AI Researcher covering benchmarks, frontier models, and the path to AGI.