Analysis · February 27, 2026 · By Sara Voss
16M illicit queries
3 accused labs
$100M+ revenue foregone
On February 23, Anthropic accused three Chinese AI companies — DeepSeek, Moonshot AI, and MiniMax — of orchestrating a systematic campaign to extract the capabilities of Claude through model distillation. The allegations: approximately 24,000 fraudulent accounts, over 16 million illicit API queries, and a deliberate effort to train smaller, cheaper models on Claude’s outputs. The story broke during the week of US Congressional testimony on AI competition policy, and it has since metastasized far beyond an intellectual property dispute. This is a national security question.

Model distillation is the process of using the outputs of a large, capable model to train a smaller, more efficient one. The small model learns to mimic the large model’s reasoning patterns without having access to its weights. This is a legitimate, well-documented technique — Google did it openly with PaLM → Gemini Nano, and Meta’s LLaMA papers discuss it extensively.
The line between legal and illegal is permission. Academic or internal distillation of a model you have licensed access to: generally permissible. Running 16 million queries through fake accounts on a platform that explicitly prohibits commercial use and reverse engineering: a Terms of Service violation with potential legal exposure under the CFAA and international IP frameworks.
But the legal question understates the issue. Anthropic doesn’t offer commercial Claude access in China — a policy the company has explicitly linked to national security concerns, at the cost of hundreds of millions in foregone revenue. The accusation is that the accused labs circumvented this access restriction entirely.

Sixteen million queries is not a usage spike — it’s a training pipeline. At typical Claude API pricing ($3–$15 per million tokens for Claude 3 Opus), a 16M query campaign across diverse task types would cost somewhere between $48,000 and $240,000 depending on prompt complexity. That’s a small fraction of the cost of training a frontier model from scratch. The economics of distillation, if the accusation is accurate, are extraordinarily favorable: you get a model that approximates SOTA capabilities for pennies on the dollar of legitimate training compute.
The US AI advantage over China has been maintained partly through chip export controls (restricting access to H100/H200-class GPUs) and partly through access restrictions (US frontier labs declining to sell API access to Chinese organizations). Anthropic’s policy of not serving China commercially is the second pillar — and if distillation bypasses it, the pillar doesn’t hold.
Frontier AI models are dual-use infrastructure. The same reasoning capabilities that make Claude useful for coding and analysis also make it useful for signals intelligence, strategic planning, and adversarial use cases. If a Chinese lab can train a model on 16 million Claude outputs, they haven’t just saved training compute — they’ve extracted a capability that US export policy was explicitly designed to prevent.
“Access restrictions and chip export controls are only effective if the model’s knowledge cannot be extracted through its outputs. Distillation breaks that assumption entirely.”

MiniMax has been pricing its API access at approximately $0.30 per million tokens — a fraction of Claude’s $3–$15/M range. Aggressive pricing from a well-funded Chinese lab could be justified by lower labor costs, different infrastructure economics, or investor subsidization. But if the distillation allegation is accurate, there’s a third explanation: training costs were externalized onto Anthropic’s infrastructure.
This raises a question that the industry has not yet answered: how many cheap AI products currently available in the market were built — partly or substantially — on capabilities distilled from frontier Western models without authorization? The honest answer is: we don’t know. And that’s the problem.

The reactive playbook — detect abuse, ban accounts, file a legal complaint — is necessary but insufficient. Three structural responses are needed:
| Scenario | Status | Legal Basis |
|---|---|---|
| Internal distillation, licensed API, research purposes | Permissible | Per-license ToS, academic exemptions |
| Distillation for commercial product, licensed access | License-dependent | ToS §4 (Anthropic); varies by provider |
| Fake account creation to bypass access restrictions | Prohibited | CFAA, ToS, fraud statutes |
| Distillation to circumvent export/access controls | Prohibited | EAR, national security statutes |
| Using distilled model commercially without disclosure | Legal grey area | No current US statute; Congressional debate ongoing |