PolicyTrace system design chapter 01

PolicyTrace layered architecture with evidence and review loops

A production-minded view of PolicyTrace as a reusable Document AI pattern: parse documents, protect sensitive data, bound the model call, preserve evidence, arbitrate conflicts, and keep a human review path.

PolicyTrace Document AI Architecture Evidence Trail

Layered Architecture Map

The important story is not a straight pipeline. It is a set of responsibility boundaries around parsing, model calls, evidence artifacts, arbitration, and human review.

Main processing flow Evidence flow Review feedback loop

User / Reviewer

People inspect extracted policy facts, source evidence, conflicts, and corrections.

Human review boundary

Policy analystUploads the multi-PDF pack.

ReviewerChecks fields, evidence, and conflicts.

AuditorInspects lineage and decisions.

Review decisionsApprovals, overrides, and notes created by the human workflow.

React Review UI

Upload flow, split-screen review, field focus, source preview, and overrides.

Upload workspaceMulti-document pack.

Extraction dashboardProgress and summary.

Review interfaceApprove or correct.

Citation viewerPDF evidence focus.

Review summaryResults and conflicts.

Visible review stateWhat the reviewer can inspect, change, and approve.

FastAPI / Session

Session state, extraction endpoints, PDF serving, and review updates.

API

API endpointsUpload, extract, review.

JOB

Session stateDemo job context.

PDF

PDF servingDocument access for UI.

PUT

Review updatesOverrides and decisions.

Session artifactsIn-demo state and job outputs, not durable production storage.

Orchestration / Extraction

Docling parse, classification, specialist extraction, and schema validation.

Document ingestionLoad PDFs and metadata.

Docling parseMarkdown, layout, pages.

ClassifierSchedule, SOF, certificate.

Field extractionSpecialist prompts.

Schema validationPydantic partial records.

Parsed artifactsMarkdown, page references, layout, and intermediate records.

LLM / Provider

PII-safe prompt construction, Groq and Instructor calls, and typed partial output.

Privacy boundary

PII

Mask + promptConfigured sensitive values are removed before transfer.

Model boundary

LLM

Groq + InstructorStructured model call.

JSON

Typed responsePartial JSON output.

Provider boundaryOnly masked prompt context should cross this system line.

Evidence / Artifacts

Uploaded PDFs, geometry, extracted values, citations, and review state.

PDF

Uploaded PDFsOriginal demo pack.

GEO

Parsed geometryPages and boxes.

REC

Partial recordsPer-document JSON.

CIT

Field citationsfield_citations.json map.

REV

Review stateCorrections and decisions.

Evidence mapSource matches that let a reviewer inspect why a field exists.

Trust / Arbitration

Authority rules, conflict detection, Golden Record assembly, and publish gate.

Trust boundary

ARB

PolicyArbiterSource authority rules.

CON

Conflict detectionExpose disagreements.

Golden RecordCanonical assembly.

GATE

Quality gateReady for review use.

Trusted outputReviewable Golden Record plus visible conflicts and hardening gaps.

Boundary legend

Privacy

PII is removed before the model call.

Model

The provider call is bounded and typed.

Trust

Arbitration happens after extraction.

Human

Review decisions write back to state.

Current demo vs production hardening

The map shows what this repo proves without pretending it already has production controls such as auth, RBAC, durable storage, audit logging, monitoring, or deployment gates.

Architecture thesis

PolicyTrace is not a PDF-to-JSON chain. It is a reviewable decision system.

The useful architecture question is not "where does the LLM sit?" It is "where does responsibility change hands?" PolicyTrace becomes interesting when the model is treated as one bounded worker inside a larger system for privacy, evidence, arbitration, and review.

The model produces candidates. The system earns trust.

The diagram separates extraction from confidence. A model response can create a typed partial record, but the final Golden Record should be assembled after source authority, field citations, conflict handling, and reviewer approval are visible.

1
Before the model: parse documents, classify them, and remove configured sensitive values.
2
During the call: ask for typed partial records, not a magical final answer.
3
After the call: preserve artifacts, map citations, arbitrate conflicts, and expose review controls.

Why the layered view matters

The architecture is strongest where it refuses to hide the uncomfortable parts.

Document AI demos often compress everything into a single arrow from PDF to JSON. That hides the failures that matter in production: private data crosses a boundary, source evidence gets lost, conflicting documents disagree, and nobody knows whether a reviewer corrected the answer.

Privacy is before the provider

PII masking is shown before the Groq/Instructor lane, so privacy is a system boundary rather than a cleanup step after extraction.

Extraction is typed work

Docling, classification, specialist prompts, and Pydantic validation each have a role. The model does not own the whole workflow.

Evidence has its own path

Uploaded PDFs, geometry, partial records, and field citations move alongside values, giving the reviewer something inspectable.

Trust is outside the model

PolicyArbiter, conflict detection, Golden Record assembly, and the review gate happen after model output exists.

How to read the map

Read each layer as a contract, not just a component group.

The diagram is intentionally layered because each row has a different ownership question. That makes the system easier to reason about, test, and harden later.

User and UI

Reviewers need a surface for inspecting evidence, correcting fields, and understanding conflicts.

human review boundary

API and session

FastAPI coordinates uploads, extraction calls, document serving, and review updates.

session state, not enterprise storage

Extraction path

Docling parse, document classification, field extraction, and schema validation narrow the task before the model is trusted.

bounded specialist work

Evidence path

Source text, page geometry, partial records, citations, and decisions remain inspectable alongside values.

field_citations.json

Trust path

The Golden Record is produced after authority rules and conflict handling, then passed to a reviewer-facing gate.

PolicyArbiter plus review

Implementation honesty

Show what the repo proves, and name what production would still require.

This is the difference between a credible architecture note and a sales diagram. The current project demonstrates a pattern. It does not claim to be an enterprise document platform.

Current project proves

1
Multi-PDF ingestion with Docling-based parsing and document classification.
2
Configured PII masking before structured Groq/Instructor extraction calls.
3
Pydantic partial records, PolicyArbiter rules, field citations, conflicts, and review UI.

Production hardening still needs

1
Authentication, RBAC, tenant-aware storage, and durable sessions.
2
Persistent artifact storage, audit logging, and retention controls.
3
Evaluation suites, monitoring, cost controls, deployment gates, and operational runbooks.

Next core chapter

The Golden Record problem.

The next post should zoom into the trust layer: when documents disagree, which source wins, and how should that decision be exposed to the reviewer?

Back to PolicyTrace Next chapter