Anthropic Warns of “Industrial-Scale” Claude Distillation Attacks by Rival AI Labs

The company says DeepSeek, Moonshot, and MiniMax generated 16M+ Claude exchanges via 24,000 fraudulent accounts—raising concerns about safety “stripping,” export controls, and a widening gray market for frontier-model access.

By Behind the TechPublished about 4 hours ago • 5 min read

What Happened (Facts)

In a post dated 23 Feb 2026, Anthropic says it uncovered three large-scale campaigns aimed at illicitly extracting (“distilling”) Claude’s capabilities to improve competitors’ models. Anthropic attributes the campaigns to DeepSeek, Moonshot, and MiniMax, alleging they collectively generated over 16 million exchanges with Claude using roughly 24,000 fraudulent accounts, violating Anthropic’s terms of service and regional access restrictions.

Anthropic explains that distillation is a common and legitimate training technique: a weaker model is trained on outputs of a stronger model. Frontier labs often distill their own models to make smaller, cheaper versions for customers. The concern here is illicit distillation—using another company’s model outputs to accelerate capability gains “at a fraction of the time and cost” of building those capabilities independently.

Anthropic says these campaigns are increasing in “intensity and sophistication,” and argues the threat is broader than any single company or region—requiring “rapid, coordinated action” across industry, cloud providers, policymakers, and the wider AI community.

Why Anthropic says it matters

Anthropic argues that illicitly distilled models are likely to lack critical safety safeguards, creating potential national security risks. The post claims US AI labs build systems intended to prevent misuse in areas like bioweapons development and malicious cyber activity, and that distillation could copy “capabilities” while shedding protections. Anthropic also warns that if such models are then used by military, intelligence, or surveillance systems—or open-sourced—the risks multiply.

Export controls angle

Anthropic connects distillation to export controls on advanced chips. It argues distillation undermines export controls by allowing foreign labs (including those “subject to the control of the Chinese Communist Party,” per the post) to close capability gaps without building everything from scratch. At the same time, Anthropic claims the ability to execute distillation at scale still requires access to advanced compute—so detecting these attacks should reinforce the rationale for restricting high-end chip access.

What Anthropic says it found

Anthropic says each campaign followed a broadly similar playbook:

Use fraudulent accounts and proxy services to access Claude at scale while evading detection

Generate large volumes of highly structured, repetitive prompts designed to extract specific capabilities rather than serve normal user needs

Focus on Claude’s “most differentiated capabilities”: agentic reasoning, tool use, and coding

Anthropic claims it attributed each campaign “with high confidence” using indicators such as IP address correlation, request metadata, infrastructure indicators, and—in some cases—corroboration from industry partners.

DeepSeek (150,000+ exchanges)

Anthropic says DeepSeek targeted:

reasoning across diverse tasks

rubric-based grading tasks (to make Claude function like a reward model for reinforcement learning)

censorship-safe alternatives to policy-sensitive queries

Anthropic also describes coordinated traffic patterns consistent with “load balancing” across accounts. It says it observed prompts attempting to make Claude articulate internal reasoning step-by-step to generate “reasoning training data” at scale.

Moonshot AI (3.4M+ exchanges)

Anthropic says Moonshot targeted:

agentic reasoning and tool use

coding and data analysis

computer-use agent development

computer vision

Anthropic claims Moonshot used “hundreds” of fraudulent accounts across multiple access pathways, making detection harder. It also alleges Moonshot later tried a more targeted approach to reconstruct Claude’s reasoning traces.

MiniMax (13M+ exchanges)

Anthropic says MiniMax focused on:

agentic coding

tool use and orchestration

Anthropic claims it detected this campaign while still active, gaining visibility into the lifecycle from data generation to model launch. It also says MiniMax pivoted quickly when Anthropic released a new model—redirecting significant traffic within 24 hours to extract capabilities from the latest system.

How distillers allegedly got access

nthropic says it does not offer commercial access to Claude in China (or to certain subsidiaries). It alleges labs circumvented this using commercial proxy services that resell access at scale, operating “hydra cluster” networks—large pools of fraudulent accounts that distribute traffic across Anthropic’s API and third-party cloud platforms. When one account is banned, another replaces it, and the proxy traffic is mixed with normal traffic to blend in.

Anthropic’s response measures

Anthropic says it is strengthening defenses via:

Detection: classifiers + behavioral fingerprinting to spot distillation patterns (including chain-of-thought elicitation patterns)

Intelligence sharing: sharing indicators with other labs, cloud providers, and relevant authorities

Access controls: stricter verification for common account-creation pathways

Countermeasures: product/API/model-level steps to reduce the usefulness of outputs for illicit distillation without harming legitimate customers

What Is Analysis (Interpretation)

Anthropic’s post is one of the clearest public claims yet that “model extraction” is evolving from isolated abuse into industrial competition tactics. If the numbers are accurate—16M+ exchanges via 24k accounts—this looks less like opportunistic scraping and more like an organized training pipeline.

1) The real strategic risk is “capabilities without constraints”

Anthropic’s strongest point is not that rivals copy answers—it’s that they may copy capabilities while discarding or weakening safety behaviors and governance. Even if distilled models are less capable overall, losing safety guardrails could make them more dangerous per unit of capability—especially if they become widely available or embedded into sensitive state systems.

2) The gray market for frontier-model access is becoming infrastructure

The “proxy services + hydra clusters” description suggests an ecosystem that may persist regardless of any single lab’s actions. If access brokering is profitable, the market will keep innovating around verification and bans. This becomes a platform-level issue—similar to spam or ad fraud—where enforcement is a constant arms race.

3) Attribution and incentives will be contested

Anthropic claims high-confidence attribution via metadata and partner corroboration, but these claims will inevitably be disputed by accused parties. Also, the public framing serves multiple incentives for Anthropic:

deterring abuse by showing detection capability

shaping policy debates around export controls

signaling to enterprise and government buyers that Anthropic takes model security seriously

None of that makes the claims false—but it does mean the post is both security disclosure and strategic messaging.

4) Defensive measures may reshape how frontier models are offered

If distillation attacks scale, labs may respond by tightening access and reducing “extractability.” That could mean more KYC-like verification, stricter rate limits, more aggressive anomaly detection, and potentially output transformations that preserve usefulness for humans but reduce training value for competitors. The tradeoff: security vs openness. Tighter controls can frustrate legitimate developers and researchers and may concentrate power further inside a few vendors.

5) The industry coordination problem is real—and hard

Anthropic calls for coordinated action among labs, cloud providers, and policymakers. That’s sensible, but difficult: competitors don’t naturally share intel, cloud providers may have conflicting incentives, and governments vary widely in priorities. Meanwhile, attackers only need one weak link—one proxy network, one leaky onboarding path, one under-monitored reseller—to keep operating.

Bottom line: This announcement is a sign that frontier AI competition is shifting from “who trains the biggest model” to “who can protect their model’s value.” If Anthropic is right, distillation attacks aren’t just IP theft—they’re a pathway for rapid capability diffusion without the safety norms and controls that frontier labs claim to be building.

artificial intelligence tech

About the Creator

Behind the Tech

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Keep reading

More stories from Behind the Tech and writers in Futurism and other communities.

Anthropic Warns of “Industrial-Scale” Claude Distillation Attacks by Rival AI Labs

The company says DeepSeek, Moonshot, and MiniMax generated 16M+ Claude exchanges via 24,000 fraudulent accounts—raising concerns about safety “stripping,” export controls, and a widening gray market for frontier-model access.

About the Creator

Behind the Tech

Reader insights

Be the first to share your insights about this piece.

Comments

Keep reading

IBM Shares Plunge After Anthropic Claims AI Can “Flip the Equation” on COBOL Modernization

Inkjet Printers Market: Mobile & Cloud Printing Integration, Seamless Connectivity & Growth Outlook

Benefits and Risks of AI-Driven Social Media Ecosystems

I Remember the Long Train Rides