olicyTrace is a Document AI workflow for UK motor insurance PDFs.
It takes a policy document pack, extracts a structured Golden Record, resolves overlapping fields across documents, and gives reviewers source-level evidence inside a split-screen PDF audit UI.
This project is part of AI Tool Stack: practical AI builds, deployable workflows, and lessons beyond the demo.
Start here:
Try It
Use the public demo with the synthetic PDFs from the repository:
Open PolicyTrace on Hugging Face
For public demos, do not upload real customer documents unless proper retention, access, and security controls are in place.
Why This Project Exists
Many AI document demos stop once a model returns JSON. That is useful as a prototype, but real document workflows need more:
- PDF parsing that survives real layouts.
- Typed outputs that downstream systems can trust.
- PII handling before model calls.
- Multi-document source authority rules.
- Conflict detection.
- Field-level evidence.
- A human review loop.
PolicyTrace shows that fuller path using a realistic UK motor insurance pack.
What It Does
PolicyTrace can process a set of insurance PDFs such as:
- Schedule of Insurance
- Certificate of Motor Insurance
- Statement of Fact
- Policy Booklet
The workflow then:
- Converts PDFs into text and layout data.
- Masks configured PII before extraction.
- Classifies document types.
- Extracts typed structured data.
- Merges fields into a Golden Record.
- Detects conflicts between documents.
- Matches extracted values back to source PDF locations.
- Lets a reviewer verify, flag, or override fields.
The System Shape
The application has two main parts:
- A Python/FastAPI backend for PDF conversion, extraction, arbitration, provenance matching, and session storage.
- A React review UI for upload, PDF inspection, field highlighting, verification, overrides, and flags.
The stack includes Docling, Groq, Instructor, Pydantic, Microsoft Presidio, spaCy, FastAPI, React, Vite, Tailwind, react-pdf, Zustand, Docker, and Hugging Face Spaces.
The Important Design Choice
PolicyTrace treats evidence as part of the product, not an afterthought.
The system does not only say:
This is the policy number.
It also tries to answer:
Where did that value come from, and can a human reviewer inspect the original source?
That is the difference between a prompt demo and a reviewable workflow.
Blog Series
Start with these PolicyTrace write-ups:
- Building PolicyTrace: A Document AI Workflow Beyond PDF-to-JSON
- Why Document AI Needs Evidence, Not Just Extracted JSON
- Inside the PolicyTrace Architecture
- The Golden Record Problem: Resolving Conflicts Across Insurance Documents
- Designing the Human Review Loop for AI Extraction
- Deploying PolicyTrace with GitHub, Docker, and Hugging Face Spaces
As each post is published, link it here and tag it with PolicyTrace.
Current Limitations
PolicyTrace is a practical demo and reference implementation, not a finished insurance production system.
Before production use, it would need:
- Authentication and access control.
- Persistent storage policy.
- Audit logs.
- Background processing for long-running extraction.
- Monitoring and error reporting.
- A larger evaluation set.
- Stronger retention controls for sensitive documents.
That is the point of the project: to show the realistic shape of the system, including the parts still needed before production.