← Field Notes
EN/ES

Google's Tool That Sniffs Out Unsafe AI Models

May 19, 2026via github · @GoogleCloudPlatform
AIopen-sourcetools

A lock-check for AI models

Imagine you hire a contractor through an agency. The agency tells you he's been background-checked. But what if someone swapped the paperwork? You'd want a way to verify it yourself — quickly, without running a six-week process.

That's roughly the problem Google just helped solve for anyone building products on top of open AI models.

They released a free tool called AMS — Activation-based Model Scanner. It checks whether an AI model still has its safety training intact, or whether someone has stripped it out to make the model do things it was originally trained to refuse.

The clever part: it doesn't test the model by asking it dangerous questions and seeing what happens. Instead, it looks inside the model's structure — at mathematical patterns that should exist if safety training is present — and confirms they're there. The whole check takes 10 to 40 seconds.

Why does this matter for a business owner? If you're building anything on top of open-weight models — a customer chatbot, an internal tool, an automated workflow — you're trusting that the model underneath behaves as advertised. Until now, verifying that was slow and unreliable. This makes it practical.

It's the kind of quiet, infrastructure-level thing that doesn't make headlines, but that you'll be glad exists when something goes wrong.

Words worth knowing

Open-weight model — An AI model whose inner workings are publicly released, so anyone can download, modify, and redistribute it. Think of it like open-source software, but for AI.

Safety training — The process of teaching an AI model to refuse harmful requests. It's baked in during training, but it can be deliberately removed.

Abliterated model — A version of an AI model that has had its safety training surgically removed. These circulate online and are sometimes used to bypass content policies.

Supply-chain risk — In AI, this means the risk that a model or tool you're using has been tampered with before it reaches you — similar to counterfeit ingredients in a supply chain.

Check it out →

Written by David at AC0.AI. Follow on @ac0hero

Want us to audit your site? Takes 60 seconds →