Colossal Biosciences said it cloned red wolves. Is it for real?

If you want to capture something wolflike, it’s best to embark before dawn. So on a morning this January, with the eastern horizon still pink-hued, I drove with two young scientists into a blanket of fog. Forty miles to

The most instructive thing about Colossal Biosciences cloning red wolves is not whether they succeeded. It is that most observers lack the tools to tell the difference — and that the company knows it.

This is not a cynical observation about biotech. It is a precise description of the structural problem facing every executive who has signed an enterprise AI contract in the last three years. The red wolf story is a working model of how extraordinary capability claims circulate through an information environment in which the people most motivated to scrutinise them have been financially incentivised not to.

When “It Works” and “It Does What You Need” Are Different Questions

There is a genuine scientific dispute about whether the red wolf is a distinct species at all. Several population geneticists argue it is a coyote-grey wolf hybrid that stabilised into a distinct breeding population, not a separate evolutionary lineage. Colossal cloned animals from existing red wolf cell lines — animals that may themselves be hybrids. The clones are genetically identical to their source material. Whether that source material constitutes a red wolf in any ecologically meaningful sense is an open question that the announcement did not address.

The C-suite equivalent of this happens in every enterprise AI procurement cycle. A vendor demonstrates a model achieving 94% accuracy on a named benchmark. The benchmark is real. The accuracy figure is real. What is not disclosed is that the benchmark was constructed from data that structurally resembles the training distribution — and that your operational data does not. You have purchased a clone of something whose provenance is contested, and you will discover this approximately eight months into deployment.

The concrete data point here: a 2024 analysis of AI deployments across 200 enterprises found that fewer than a third reported production performance within 15 percentage points of vendor-quoted benchmarks on their specific use cases. The gap is not fraud. It is the red wolf problem — technically accurate claims about a thing that may not be the thing you needed.

The Independent Verification Deficit

When Colossal published its announcement, the biologists most qualified to evaluate it were largely the same biologists who had received Colossal funding, collaborated on adjacent de-extinction projects, or sat on advisory boards. The independent critical response came mostly from academics outside the immediate field — people with relevant expertise but without the deep familiarity required to catch the specific methodological choices that would separate a robust result from a well-staged one.

This structure is not unique to conservation biology. It maps almost precisely onto how enterprise AI capability claims are validated — or fail to be. The major analyst firms who rate and rank AI vendors derive substantial revenue from the same vendors through advisory contracts, speaking engagements, and sponsored research. Independent benchmarking organisations are typically funded by consortia that include the companies being benchmarked. The researchers best positioned to evaluate a model’s production readiness are often the ones who trained it.

The practical consequence is that due diligence processes that look rigorous — Gartner Magic Quadrant positioning, third-party security audits, reference customer calls — are structurally incapable of catching the specific failure modes that will matter to you. Reference customers are selected by the vendor. Security audits test for known vulnerability classes. Analyst positioning reflects market momentum and financial stability more than operational capability on your problem class.

The one exception, and it is worth naming explicitly: regulatory environments that mandate third-party validation generate genuinely useful signal. The FDA’s requirements around clinical AI tools have produced a body of post-market surveillance data that is operationally honest in a way that almost no commercial AI benchmark is. If your AI procurement sits adjacent to a regulated domain — financial services model risk management, clinical decision support, insurance underwriting — the regulatory audit trail is worth more than any vendor-provided benchmark.

The Conservation Framing as Strategic Misdirection

Colossal has been extraordinarily skilful at something that has nothing to do with genetics: framing its commercial objectives inside a conservation narrative that makes scrutiny feel churlish. Asking hard questions about the red wolf clones is made to feel like opposing species preservation. The halo of ecological urgency obscures questions about what a cloned red wolf population would actually need to become self-sustaining — habitat that does not exist at scale, prey populations that are managed rather than wild, ongoing human intervention to prevent hybridisation with coyotes. The announcement answers the question “can we clone them?” while the operationally important question — “can we restore them?” — goes largely unasked.

Precisely the same rhetorical move appears in enterprise AI. The productivity narrative (“your employees will get hours back”) and the competitive threat narrative (“your competitors are already doing this”) both function as conservation framings. They create an atmosphere in which operational scepticism reads as organisational timidity. The executive who asks “what happens when the model hallucinates in a client-facing context?” is positioned as a blocker rather than as someone doing the job of risk management.

The tell, in both cases, is a conspicuous absence of specificity about what happens next. Colossal’s announcement detailed the cloning process in depth and said almost nothing about the reintroduction plan. AI vendors who present compelling demos rarely specify the remediation process when the model produces a confident wrong answer on a novel input type.

If you are evaluating a new AI capability and the vendor’s evidence consists primarily of benchmark performance on named datasets, apply a 30–40% discount to any projected operational metric and pilot on a representative sample of your actual edge cases before committing to scale deployment.

If the AI use case sits inside a regulated domain where post-market surveillance data exists, prioritise that data over vendor benchmarks — it is the only signal in the ecosystem generated by an entity without a financial interest in your purchase.

If you find yourself being asked to move quickly because a competitor has announced adoption, treat that urgency as a reason to slow down rather than accelerate. Competitive pressure is the conservation narrative of enterprise AI: emotionally compelling, strategically useful to the vendor, and largely orthogonal to whether the thing works for you.

The red wolves may be real. The question worth asking is whether they are what you need them to be.

Colossal Biosciences said it cloned red wolves. Is it for real?

When “It Works” and “It Does What You Need” Are Different Questions

The Independent Verification Deficit

The Conservation Framing as Strategic Misdirection

Chinese tech workers are starting to train their AI doubles — and pushing back

The Download: bad news for inner Neanderthals, and AI warfare’s human illusion

対話を始めませんか？