Back to PolicyTrace

Project deep dive

Building PolicyTrace: A Document AI Workflow Beyond PDF-to-JSON

PolicyTrace is a practical Document AI workflow for UK motor insurance PDFs, built to show what happens after a model returns JSON.

Most AI document demos stop at the moment the model returns JSON.

That is a useful starting point, but it skips the part that usually decides whether the system can be trusted: parsing the source document, validating the output, handling sensitive data, resolving disagreements across documents, showing evidence, and giving a reviewer a way to correct the result.

PolicyTrace is my first AI Tool Stack project built around that messy middle.

It is a Document AI workflow for UK motor insurance PDFs. It takes a policy document pack, extracts a structured Golden Record, matches fields back to the source PDF, and gives a human reviewer a split-screen interface for checking the result.

Live demo:

https://huggingface.co/spaces/AItoolstack/AI-PolicyTrace

Source code:

https://github.com/AItoolstack/ai-policytrace

Project page:

/policytrace/

Why This Project Exists

The easy version of this problem is:

Upload a PDF and ask a model to return JSON.

The harder version is:

Can this become a workflow that someone could inspect, test, deploy, and improve?

That second question changes the design.

For a real document workflow, the JSON is only one piece. The system also needs to know which document a field came from, whether another document disagreed, whether the value fits the expected schema, and whether a reviewer can trace the value back to the original PDF.

PolicyTrace uses UK motor insurance documents because they are a good practical test case. A policy pack can include a Schedule of Insurance, Certificate of Motor Insurance, Statement of Fact, and Policy Booklet. These documents overlap, but they are not equally authoritative for every field.

That makes the workflow more interesting than simple extraction.

What PolicyTrace Does

PolicyTrace processes a policy PDF pack and creates a structured Golden Record.

The current workflow:

  • Uploads multiple insurance PDFs.
  • Converts PDF text and layout into a usable representation.
  • Masks configured PII before model calls.
  • Classifies each document type.
  • Extracts typed fields into a Pydantic schema.
  • Merges records using document authority rules.
  • Detects conflicts between source documents.
  • Matches extracted fields back to PDF locations.
  • Lets reviewers verify, flag, or override fields.

The goal is not to pretend the system is finished production software. The goal is to show the real shape of a deployable AI workflow.

The Architecture At A Glance

PolicyTrace has a Python backend and a React frontend.

The backend uses FastAPI for the API, Docling for PDF text and layout conversion, Presidio and spaCy for PII masking, Groq and Instructor for structured extraction, Pydantic for typed outputs, and custom arbitration/provenance logic.

The frontend uses React, Vite, Tailwind, react-pdf, and Zustand. It gives the reviewer a source PDF on one side and the extracted record on the other.

The important design decision is separation of concerns.

The model extracts structured values, but it does not decide everything. Pydantic validates the shape. The arbiter applies source authority rules. The provenance matcher tries to connect fields back to source PDF geometry. The UI keeps the reviewer in the loop.

That separation is what turns the project from a prompt demo into a workflow.

Why Evidence Matters

If a system extracts a policy start date, premium, vehicle registration, or driver detail, the next question is obvious:

Where did that come from?

Without evidence, the reviewer has to trust the model or manually search the PDF.

PolicyTrace tries to reduce that gap. During extraction, the prompts ask for source phrases alongside the structured values. Those phrases are used internally to match fields back to PDF text geometry. The final reviewer sees the extracted value and the source location together.

This is not a legal-grade guarantee. Provenance matching can still fail, especially with messy PDFs or transformed values. But it is a better workflow than returning JSON without a trail.

The Golden Record Problem

Insurance documents overlap.

The Schedule may be the strongest source for vehicle details and financial summary. The Certificate may be stronger for driving entitlement and class of use. A Statement of Fact may include risk details that do not appear elsewhere.

A naive merge would hide these differences.

PolicyTrace uses an arbiter to merge fields according to a hierarchy of truth. When documents disagree, the conflict is recorded instead of silently buried.

That idea matters outside insurance too. The same pattern appears in claims, compliance, onboarding, lending, legal review, and other document-heavy workflows.

The Human Review Loop

PolicyTrace includes a review UI because extraction is not the end of the workflow.

The reviewer can inspect each field, see matching PDF evidence, verify fields, flag questionable values, and override extracted values when needed.

This changes the role of the model. The model becomes part of a reviewable system instead of an invisible decision-maker.

For many practical workflows, that is the right direction: automate the first pass, preserve evidence, surface uncertainty, and let humans handle exceptions.

Try It

The public demo is available on Hugging Face Spaces:

https://huggingface.co/spaces/AItoolstack/AI-PolicyTrace

The source code is on GitHub:

https://github.com/AItoolstack/ai-policytrace

For public demos, use the synthetic PDF pack included in the repository. Do not upload real customer documents to a public demo unless proper retention, access, and security controls are in place.

What Comes Next

PolicyTrace is the first project in the AI Tool Stack library.

The next posts will go deeper into the parts that make this more than PDF-to-JSON:

  • Why Document AI needs evidence.
  • How the architecture is structured.
  • How Golden Record arbitration works.
  • What the review UI is designed to solve.
  • How the project is deployed with Docker and Hugging Face Spaces.

The point of this series is not to present a perfect final system. It is to show the engineering path from demo to something real enough to inspect, break, improve, and deploy.