Imagine you run a platform where people upload files — contracts, photos, CVs, spreadsheets. You probably assume your system knows what it's receiving. It often doesn't, not really.
For decades, software has guessed file types by peeking at the first few bytes of a file — a rough trick that breaks constantly with modern formats. Google built something better, trained it on a hundred million real files, and now uses it to route every attachment that passes through Gmail and Drive through their security scanners. Hundreds of billions of files a week.
They've now released it as a free, open tool called Magika. It can tell a TypeScript file from a JavaScript one, a JSON file from a near-identical variant — distinctions that older tools simply miss. It does all this in about five milliseconds per file.
For most business owners, this isn't something you'd touch directly. But if you're building any kind of product that accepts uploads — a client portal, a document tool, an AI assistant that reads files — the developers you work with will want to know about this. It's the kind of foundational piece that quietly makes everything else more reliable.
The fact that Google is releasing the same engine they use internally, for free, says something about where infrastructure is heading.
Open-source — Software where the underlying code is made public and free to use. Think of it like a recipe being shared rather than kept secret.
File type detection — When a system figures out what kind of file something is (a PDF, an image, a spreadsheet). Usually done automatically, and usually imperfectly.
Deep-learning model — A type of AI trained by showing it enormous amounts of examples until it learns patterns a human couldn't write down. Magika learned to recognise files the same way you'd learn to recognise handwriting — by seeing millions of samples.
If you're building a product that handles file uploads, bring this up with your tech team. Ask them what you're currently using to detect file types — and whether it's ever caused problems.