How We Helped a Global FMCG Brand Process 100,000+ Quality Documents with AI

Here’s How We Helped a Global FMCG Brand Process 100,000+ Product Labels for Compliance with AI Without Losing Accuracy

When a global FMCG manufacturer came to us, the problem wasn’t a lack of data. It was too much of it, trapped in the wrong format.

Every batch of product that left their manufacturing facilities came with a product label, a PDF capturing ingredient lists, nutritional information, regulatory compliance data, and certification sign-offs. Multiply that across multiple production lines and years of operations, and the number crossed 100,000 documents. Each one needed to be reviewed, key fields extracted, and validated against regulatory standards before it could be archived or fed into downstream compliance systems.

Manually, this was being done by a regulatory and compliance team reading each PDF, copying values into spreadsheets or systems of record, and cross-checking for errors. It worked, but it didn’t scale. Every new product line, every audit cycle, every new market added more labels to a queue that was already too long.

The brief from the client was direct: automate the extraction and validation of these product labels using AI document processing, without compromising the accuracy their global compliance processes depended on.

Why this isn’t a simple OCR problem

On paper, the solution looks straightforward – run the PDFs through an AI extraction pipeline, check a confidence score, and approve. We’ve seen that assumption before, and it doesn’t survive contact with real manufacturing documents.

FMCG product label documents are messy in ways that generic OCR tools aren’t built for. Some are clean, printed label proofs. Many aren’t. Regulatory officers write corrections by hand in the margins. Text gets struck through and rewritten when a formulation or compliance requirement changes. Stamps overlap printed text. Some documents are scans of carbon-copy forms that are barely legible. A pipeline that only handles clean printed text would have failed on a meaningful fraction of this client’s real-world documents, and in a compliance-driven industry like FMCG, “meaningful fraction” isn’t a number you can shrug off.

So before writing a line of extraction logic, we spent time with the client’s regulatory team understanding what these labels actually look like in practice, not just what they look like in the best case.

The confidence score trap in document AI

The first version of any document AI pipeline tends to use a single confidence score per document: if the model is, say, 90% confident overall, approve it; if not, send it to a human.

This approach failed us early, and it’s worth explaining why. A document can score 95% confidence overall while one specific field, say, a regulatory license number or a critical allergen warning, is wrong. The overall score is an average across every field on the page. A handful of easy, obvious fields can pull the average up and mask the one field that’s both difficult to read and the one that actually matters for certification.

We moved to field-level confidence scoring instead. Every individual field, not the document as a whole, gets its own confidence score. Only the fields that fall below a defined threshold get routed to a human reviewer. Everything else is auto-approved.

This single change had an outsized impact. It meant reviewers weren’t opening entire documents to double-check fields the AI had already extracted correctly. They were looking at the two or three fields, out of often twenty or more, that the system genuinely wasn’t sure about. That’s the difference between a reviewer re-reading a whole document and a reviewer spending fifteen seconds on a flagged field.

Handling handwriting, stamps, and strikeouts

Product label proofs in FMCG environments are working documents, not pristine forms. Officers strike out an initial value and write the corrected one beside it. Stamps get applied for “approved” or “retested” and partially obscure the field underneath. Scans vary wildly in resolution depending on which facility produced them.

We built specific handling for each of these patterns rather than treating them as OCR noise to average out. One detail surprised us during testing: in several cases, the AI correctly identified a struck-out value and chose not to extract it, treating it as superseded, which is the right behaviour, but it meant the validation workflow needed an explicit path for “a correction was made here” rather than treating every strikeout as an extraction failure. We added a separate review flag specifically for documents with visible corrections, so reviewers could quickly confirm the AI had picked up the corrected value rather than the original one.

Keeping humans in the loop, deliberately

None of this was about removing the regulatory team from the process. It was about changing what they spent their time on.

Before the pipeline went live, every workflow was validated by the client’s own domain experts against their existing manual process. We tested AI output against documents that had already been reviewed manually, comparing field by field, until the extraction accuracy held up consistently across different label types, facilities, and scan qualities. That validation phase took real time, and we treated it as non-negotiable, a pipeline that looks accurate in a demo and a pipeline that’s accurate enough to trust for global compliance certification are not the same thing.

Once live, the role of the reviewer shifted. Instead of reading every document end to end, reviewers worked from a queue of flagged fields, the ones the AI genuinely wasn’t confident about, plus documents with visible corrections or unusual formatting. The skilled people on the team were doing the same kind of judgment work they always had, just applied to a much smaller, much more relevant set of cases.

The outcome: 80% reduction in manual review effort

Over 100,000 documents went through the pipeline.
Extraction accuracy came in above 90% on a field level, validated against the client’s own manual benchmarks.
Manual review effort dropped by roughly 80%, since reviewers were only looking at flagged fields and flagged documents rather than every page.
Processing time, which had been a persistent bottleneck for the compliance team, dropped significantly.
The backlog that used to grow with every new audit cycle stopped growing.
The number that mattered most to the client wasn’t a percentage.
Their quality team, people who’d spent years building the judgment to know when a value looks off, were finally spending that judgment on the documents that actually needed it.

What we’d tell anyone considering AI document processing

If there’s one lesson from this project worth repeating, it’s that document AI in a regulated or quality sensitive industry isn’t a model selection problem. It’s a workflow design problem. The extraction model is one part of a system that also needs field level confidence scoring, a clear escalation path for ambiguous cases like handwriting and corrections, and a validation phase rigorous enough that the people relying on it actually trust it.

That combination is what turns “we ran AI on our documents” into a process that a compliance team will actually stand behind.

If you’re sitting on a backlog of documents, quality reports, compliance records, lab results, or anything similar, and manual review has become the bottleneck, we’d be glad to talk through what a workflow like this could look like for your data.

How We Automated 100,000+ Product Labels for Compliance Using AI

Why this isn’t a simple OCR problem

The confidence score trap in document AI

Handling handwriting, stamps, and strikeouts

Keeping humans in the loop, deliberately

The outcome: 80% reduction in manual review effort

What we’d tell anyone considering AI document processing

Related posts:

Why Alert Fatigue Is an Operational Risk (Not Just…

Beyond the Checklist: A Practitioner’s Review of IMDA’s LLM…

Testing Beyond Pass or Fail: A QA Engineer’s Lessons…

Leave a Reply Cancel reply