Science 2026-03-22

FDA Approved 100+ AI Medical Tools. Nobody Knows How They Work.

The FDA quietly approved 50 AI-enabled medical devices in the final two weeks of December 2025 alone. By now, the agency has authorized over 100 AI tools for clinical use — more than double the number from just two years ago. The approvals are accelerating. The oversight is not.

On December 8, 2025, the FDA announced its first-ever qualification of an AI tool specifically designed for drug development: AIM-NASH, a system that uses cloud-based algorithms to score liver biopsies in metabolic dysfunction-associated steatohepatitis (MASH) clinical trials. The system analyzes images of liver tissue and assigns numerical scores for steatosis, inflammation, and fibrosis according to standardized research protocols. It's a narrow use case — drug development, not patient diagnosis — but it's a milestone that signals where AI is actually winning in healthcare: not in replacing doctors, but in standardizing the messy, variable work that slows down drug discovery.

The real story isn't the speed of approval. It's what's missing from it.

The Transparency Problem

The FDA maintains a database of approved AI medical devices, but the summaries are skeletal. The agency itself acknowledges that published decision summaries "are not all inclusive and do not include most of the information that may be submitted in an application." Translation: the public sees a fraction of the validation data. A doctor considering whether to deploy a tool in their clinic can't easily access the clinical evidence that justified its approval.

This matters because the performance claims are often modest, sometimes contradictory, and almost always dependent on how you measure them.

Take iCAD's ProFound AI, a breast cancer detection tool that's been heavily marketed to radiology departments. The company claims a 23% relative increase in cancer detection rate based on a study of 9 radiologists over 2 years. Separately, they cite a 6% improvement in cancer detection performance versus non-AI readers. They also claim the system can detect cancers 2-3 years earlier and cuts reading time in half.

These aren't contradictory — they measure different things. But they're also not directly comparable to other AI tools. A 2023 Lancet Oncology study found that AI-supported mammography screening enabled radiologists to detect 20% more breast cancers than traditional screening alone. Is 20% better than 23%? Is it the same study? The data isn't organized in a way that lets clinicians make that comparison.

The FDA's approval process doesn't require companies to standardize how they report performance. So each tool comes with its own metrics, its own study design, its own claims. A radiologist deploying new software has to become a statistician to figure out what they're actually getting.

The Explainability Crisis

In March 2026, MIT researchers published work on a fundamental problem with AI in high-stakes settings: users need to understand why a model made a prediction, not just that it made one. In medical diagnostics, that need is acute. A radiologist using AI to flag a potential tumor needs to know what features the algorithm detected, so they can evaluate whether the AI's reasoning is sound.

Yet most approved AI medical devices don't provide meaningful explanations. They output a score, a flag, or a recommendation — but not the reasoning. The FDA has issued guidance documents on "transparency for machine learning-enabled medical devices," but these are recommendations, not requirements. Approval doesn't hinge on explainability.

This creates a trust problem that no amount of clinical validation can solve. A tool can be statistically superior to human radiologists and still be unusable if clinicians can't understand its reasoning. Worse, it creates liability questions: if an AI makes a wrong call and a doctor trusted it without being able to verify the logic, who's responsible?

Both AIM-NASH and ProFound AI explicitly require human clinicians to review and validate AI outputs before accepting them. The FDA's language is clear: "pathologists are fully responsible for final interpretation, reviewing the whole slide image and AIM-NASH outputs before accepting or rejecting the AI-generated scores." This isn't a feature. It's an admission that the AI isn't trusted enough to stand alone.

Where AI Is Actually Winning

The disconnect between hype and reality becomes clearer when you look at what AI is actually approved for. AIM-NASH isn't a clinical diagnostic tool. It's a drug development tool — designed to standardize how liver biopsies are scored in MASH clinical trials. The use case is narrower than it sounds, but it's also more realistic than the "AI replaces radiologists" narrative.

Drug development is slow and expensive partly because human pathologists score tissue samples inconsistently. One pathologist might grade fibrosis as stage 2; another might call it stage 3. This variability inflates the sample sizes needed for clinical trials, which inflates costs and timelines. AI that standardizes scoring — that produces consistent, reproducible measurements — has genuine value. It doesn't replace the pathologist's judgment. It removes noise from the system.

The same logic applies to radiology AI, though the marketing often obscures it. Radiologists read hundreds of images per day. Fatigue and variability are built into the job. An AI system that catches cancers the tired radiologist missed, or that flags suspicious areas for closer review, is a productivity tool. It's not replacing radiologists — it's helping an overworked workforce keep up with exponentially growing imaging volume.

Radiology is one of the fastest-growing medical fields, with double-digit employment growth for decades. If AI were actually replacing radiologists, that growth would be flattening. It's not. The real story is: AI is helping radiologists handle more work, more consistently, with fewer misses.

The Regulatory Lag

The FDA is approving AI tools faster than it's developing guidance on how to evaluate them. The agency has published principles for "good machine learning practice" and "predetermined change control plans," but these are guidelines, not requirements. There's no binding standard for explainability, no requirement for adversarial testing, no mandate that vendors disclose failure modes.

This creates a gap between the pace of technology and the pace of oversight. Companies are shipping tools faster than regulators can develop frameworks to evaluate them. By the time the FDA publishes guidance on concept bottleneck models or adversarial robustness, the market will have moved on to something else.

The 100+ approved devices represent a bet that clinical validation is enough. That the clinical trials prove the tool works, so it's safe to deploy. But clinical validation doesn't answer the questions that matter in practice: Why did this tool miss this cancer? Can I trust its recommendation on this edge case? What happens when the patient population shifts and the tool sees data it wasn't trained on?

These are the questions that will define the next phase of AI in healthcare. And the FDA isn't set up to answer them yet.

Field Notes

I've been digging through the FDA's AI medical device database and the clinical literature, and here's what strikes me: the agency is treating AI tools like traditional medical devices, when they're actually something different. A traditional device is static. You validate it once, it gets approved, it stays the same for years. An AI tool is dynamic. It can drift. Its performance can degrade. It can fail in ways that only emerge after deployment.

The FDA's approval process assumes you can validate a device once and then it's safe forever. But machine learning doesn't work that way. A model trained on 2023 data might perform differently on 2026 patients. A tool validated in Boston might fail in rural Texas. The agency has issued guidance on "predetermined change control plans" — basically, pre-approved ways to update models — but this is still treating AI like software patches, not like living systems that need continuous monitoring.

The other thing that jumped out at me: nobody's talking about the fact that AI's first major FDA win is in drug development, not clinical diagnosis. AIM-NASH standardizes measurements for research. That's not sexy. But it might be more important than any AI diagnostic tool, because it solves a real bottleneck in drug discovery. The narrative everyone wants is "AI replaces radiologists." The narrative that's actually happening is "AI becomes infrastructure for research and clinical workflow optimization." Less dramatic, more durable.

Finally: the explainability problem is going to explode. MIT's March 2026 research on concept bottleneck models is the canary in the coal mine. Within two years, I expect the FDA to require AI tools to provide human-understandable explanations of their predictions. This will be a major barrier to approval for black-box models. It will also force companies to rethink their architectures. The tools that can explain themselves will win. The tools that can't will be forced to add explainability layers, which will slow them down and reduce their accuracy. We're about to see a phase transition in how AI tools are built.

What Comes Next

The 100+ approved devices are just the beginning. The real question is whether the FDA's oversight can keep pace. Right now, the agency is in reactive mode — approving tools as they come, publishing guidance after the fact. What's needed is proactive frameworks: standardized metrics for performance reporting, mandatory explainability requirements, continuous monitoring systems that flag when a tool's performance drifts.

The radiologists and pathologists deploying these tools today are guinea pigs in a real-time experiment. They're learning what works and what doesn't. The FDA is learning too. But the learning is happening in clinics and hospitals, not in regulatory frameworks. By the time the agency develops binding standards, hundreds of thousands of patients will have been diagnosed or treated using tools that weren't evaluated against those standards.

That's not a failure of the FDA. It's the nature of innovation moving faster than oversight can follow. But it's also a reminder that "approved by the FDA" doesn't mean "fully understood." It means "validated enough to try." The next generation of AI medical tools will be approved faster, deployed wider, and understood less completely than the last. The question is whether the healthcare system can handle that uncertainty.

The answer, probably, is yes — because medicine has always been practiced under uncertainty. Doctors make decisions with incomplete information all the time. AI tools that are 6-23% better than human judgment, even if they're not fully explainable, are still an improvement. But that's a lower bar than the hype suggests. And it's worth understanding what we're actually getting before we celebrate the arrival of AI medicine.