Pydantic as an AI Architecture Boundary
LLM output is not application state until it passes a typed boundary. Schemas, validation errors, evidence fields, and versioned contracts make AI workflows easier to inspect and change.
The first demo works because the model returns something that looks structured. A few files later, the date is in the wrong format, the amount includes a currency symbol, a required field is missing, an enum has a new value, and the evidence text does not support the answer.
The failure is not that JSON is bad. The failure is treating model-shaped JSON as application state before it crosses a typed boundary.
The model can propose data. The system has to accept or reject it.
In production AI workflows, the most important line is often not the prompt. It is the boundary where probabilistic output becomes a deterministic object the rest of the application is allowed to use.
Pydantic is useful here because it turns a loose dictionary into a typed object, raises validation errors, rejects unexpected shapes, and gives the workflow a clear place to retry, route, review, or stop.
JSON is not a contract
A JSON object can be syntactically valid and still be unsafe for the workflow. Serious systems need to know which fields are required, which values are allowed, what evidence supports each claim, and what should happen when validation fails.
Does the output contain the expected fields, nested objects, lists, and no silent extras?
Are values constrained by types, enums, formats, ranges, and field-level rules?
Does each important claim carry source context a reviewer or evaluator can inspect?
Where the boundary sits
The boundary should sit between model generation and application state. The rest of the workflow should not have to guess whether the model returned the right object shape.
Task instructions and source context.
Candidate JSON or structured response.
Parse, coerce, reject, and explain.
Typed object the app can pass around.
Source snippets, pages, and match checks.
Human checks and regression examples.
A minimal extraction contract
The exact model depends on the workflow, but the shape should make the boundary explicit: accepted field names, required evidence, allowed confidence range, and no unexpected keys.
from typing import Literal
from pydantic import BaseModel, ConfigDict, Field, field_validator
class Evidence(BaseModel):
model_config = ConfigDict(extra="forbid")
document_id: str
page_number: int = Field(ge=1)
snippet: str = Field(min_length=1)
class ExtractedField(BaseModel):
model_config = ConfigDict(extra="forbid")
name: Literal["policy_number", "insured_name", "premium", "excess"]
value: str
evidence: Evidence
confidence: float = Field(ge=0, le=1)
@field_validator("value")
@classmethod
def value_cannot_be_blank(cls, value: str) -> str:
if not value.strip():
raise ValueError("value cannot be blank")
return value.strip()
Validation failure is useful output
A failed parse is not just an exception. It is a workflow signal. It tells the system that this result should not quietly become application state.
Ask a narrower prompt for only the missing or malformed part.
Check deterministic constraints before asking the model to explain itself.
Require source text before accepting important extracted claims.
Route unresolved or high-risk failures to a human workflow.
What belongs in the boundary
The boundary should be strict enough to protect the workflow, but honest about what validation can and cannot prove.
| Question | Good fit for schema validation | Needs another layer |
|---|---|---|
| Is the object shape correct? | Required fields, allowed fields, nested object shape, list structure, and unknown-key rejection. | Whether the model found the right part of the source document. |
| Is the value plausible? | Types, min/max bounds, date formats, enums, string lengths, and simple cross-field checks. | Whether the value is true when source documents disagree. |
| Is evidence present? | Required source document ID, page number, snippet text, match score, or citation field. | Whether the evidence is sufficient for a reviewer or business decision. |
| Can the workflow change safely? | Schema versions, validation error logs, migration rules, and evaluation fixtures. | Release decisions, reviewer policy, and operational ownership. |
Structured outputs help, but they are not the whole boundary
Structured outputs, Instructor, JSON Schema, and provider-specific response formats can make model output easier to parse. That is useful. It still does not prove that the value is correct, supported by evidence, or acceptable for the workflow.
A structured output can satisfy the object contract while still choosing the wrong date, wrong total, wrong clause, or wrong source. The boundary should make the claim inspectable; evidence, evaluation, and review decide whether the workflow should trust it.
| Boundary choice | What it helps with | What to watch |
|---|---|---|
| Provider structured outputs | Reduces schema-shape failures before the response reaches your application. | Provider-specific behavior, schema limits, and the fact that structured output still does not prove the value is true. |
| Instructor or client retry loop | Turns validation errors into targeted correction attempts using your Pydantic model and error messages. | Retry loops need limits, logs, and evaluation. Otherwise they can hide unstable prompts behind repeated repair attempts. |
| Application-side Pydantic boundary | Acts as the final acceptance gate before model output becomes backend state, API response, or review UI data. | It protects shape and constraints, not truth. Evidence, arbitration, evaluation, and review still have to do their jobs. |
- Treat valid JSON as a validated workflow result.
- Make every field optional to avoid handling failures.
- Allow unexpected fields silently when the workflow depends on a known contract.
- Validate only the final answer after intermediate mistakes have already shaped the workflow.
- Drop raw model output, validation errors, schema version, or source evidence.
- Use schemas as a substitute for provenance, arbitration, evaluation, or human review.
Implementation options to test
Start with the smallest boundary that protects the workflow. Add provider-constrained generation, retries, and graph orchestration when the workflow actually needs them.
| Need | Implementation options | What to evaluate |
|---|---|---|
| Python object contracts | Pydantic models, field constraints, validators, and extra="forbid". |
Whether bad model output fails loudly before it enters application state. |
| Provider-constrained output | OpenAI structured outputs or other schema-aware response formats. | Whether the provider can reduce parse failures while still letting your app validate business rules. |
| LLM client retries | Instructor or custom retry loops around typed outputs. | Whether retry behavior is observable, bounded, and tied to validation errors. |
| API and UI boundaries | FastAPI/Pydantic on the backend, JSON Schema or Zod-style contracts on the frontend. | Whether the same contract protects model output, API responses, and reviewer-facing UI state. |
| Release safety | Schema versions, stored validation failures, golden examples, and regression gates. | Whether a prompt, model, or schema change can be evaluated before it reaches users. |
Where this shows up
This pattern appears anywhere model output needs to enter a workflow that other software, reviewers, or evaluators depend on.
PolicyTrace uses typed extraction models with Instructor/Pydantic so document fields, citations, conflicts, and review state can move through the workflow as bounded objects.
A contract system would need typed clauses, obligations, parties, dates, amendments, risk flags, and evidence references before review.
An invoice workflow would need typed supplier identity, totals, tax lines, PO references, line items, and exception reasons.
The practical takeaway
Pydantic is not magic, and schemas do not make model output true. Their job is more practical: stop vague model output from becoming invisible application state.
That distinction matters. Once the claim is typed, versioned, validated, and connected to evidence, the workflow can retry it, compare it, evaluate it, route it to review, or reject it.
This post sits inside the Orchestration Layer. The next natural layer is how typed outputs become evidence-backed, reviewable system state.