Pydantic as an AI Architecture Boundary

LLM output is not application state until it passes a typed boundary. Schemas, validation errors, evidence fields, and versioned contracts make AI workflows easier to inspect and change.

By AI-ToolStack

May 25, 2026 9 min read

System Layer Orchestration Layer Structured Outputs

The model returned JSON. That does not mean the system has a valid object.

The first demo works because the model returns something that looks structured. A few files later, the date is in the wrong format, the amount includes a currency symbol, a required field is missing, an enum has a new value, and the evidence text does not support the answer.

The failure is not that JSON is bad. The failure is treating model-shaped JSON as application state before it crosses a typed boundary.

The model can propose data. The system has to accept or reject it.

In production AI workflows, the most important line is often not the prompt. It is the boundary where probabilistic output becomes a deterministic object the rest of the application is allowed to use.

A schema boundary is where model output becomes inspectable workflow state.

Pydantic is useful here because it turns a loose dictionary into a typed object, raises validation errors, rejects unexpected shapes, and gives the workflow a clear place to retry, route, review, or stop.

JSON is not a contract

A JSON object can be syntactically valid and still be unsafe for the workflow. Serious systems need to know which fields are required, which values are allowed, what evidence supports each claim, and what should happen when validation fails.

1 Shape

Does the output contain the expected fields, nested objects, lists, and no silent extras?

2 Meaning

Are values constrained by types, enums, formats, ranges, and field-level rules?

3 Evidence

Does each important claim carry source context a reviewer or evaluator can inspect?

Where the boundary sits

The boundary should sit between model generation and application state. The rest of the workflow should not have to guess whether the model returned the right object shape.

1 Prompt

Task instructions and source context.

2 Model output

Candidate JSON or structured response.

3 Schema boundary

Parse, coerce, reject, and explain.

4 Workflow state

Typed object the app can pass around.

5 Evidence

Source snippets, pages, and match checks.

6 Review or eval

Human checks and regression examples.

A minimal extraction contract

The exact model depends on the workflow, but the shape should make the boundary explicit: accepted field names, required evidence, allowed confidence range, and no unexpected keys.

from typing import Literal

from pydantic import BaseModel, ConfigDict, Field, field_validator


class Evidence(BaseModel):
    model_config = ConfigDict(extra="forbid")

    document_id: str
    page_number: int = Field(ge=1)
    snippet: str = Field(min_length=1)


class ExtractedField(BaseModel):
    model_config = ConfigDict(extra="forbid")

    name: Literal["policy_number", "insured_name", "premium", "excess"]
    value: str
    evidence: Evidence
    confidence: float = Field(ge=0, le=1)

    @field_validator("value")
    @classmethod
    def value_cannot_be_blank(cls, value: str) -> str:
        if not value.strip():
            raise ValueError("value cannot be blank")
        return value.strip()

Validation failure is useful output

A failed parse is not just an exception. It is a workflow signal. It tells the system that this result should not quietly become application state.

R Retry smaller

Ask a narrower prompt for only the missing or malformed part.

V Validate rules

Check deterministic constraints before asking the model to explain itself.

E Request evidence

Require source text before accepting important extracted claims.

H Send to review

Route unresolved or high-risk failures to a human workflow.

What belongs in the boundary

The boundary should be strict enough to protect the workflow, but honest about what validation can and cannot prove.

Question	Good fit for schema validation	Needs another layer
Is the object shape correct?	Required fields, allowed fields, nested object shape, list structure, and unknown-key rejection.	Whether the model found the right part of the source document.
Is the value plausible?	Types, min/max bounds, date formats, enums, string lengths, and simple cross-field checks.	Whether the value is true when source documents disagree.
Is evidence present?	Required source document ID, page number, snippet text, match score, or citation field.	Whether the evidence is sufficient for a reviewer or business decision.
Can the workflow change safely?	Schema versions, validation error logs, migration rules, and evaluation fixtures.	Release decisions, reviewer policy, and operational ownership.

Structured outputs help, but they are not the whole boundary

Structured outputs, Instructor, JSON Schema, and provider-specific response formats can make model output easier to parse. That is useful. It still does not prove that the value is correct, supported by evidence, or acceptable for the workflow.

Schema-valid is not the same as true.

A structured output can satisfy the object contract while still choosing the wrong date, wrong total, wrong clause, or wrong source. The boundary should make the claim inspectable; evidence, evaluation, and review decide whether the workflow should trust it.

Boundary choice	What it helps with	What to watch
Provider structured outputs	Reduces schema-shape failures before the response reaches your application.	Provider-specific behavior, schema limits, and the fact that structured output still does not prove the value is true.
Instructor or client retry loop	Turns validation errors into targeted correction attempts using your Pydantic model and error messages.	Retry loops need limits, logs, and evaluation. Otherwise they can hide unstable prompts behind repeated repair attempts.
Application-side Pydantic boundary	Acts as the final acceptance gate before model output becomes backend state, API response, or review UI data.	It protects shape and constraints, not truth. Evidence, arbitration, evaluation, and review still have to do their jobs.

Do not do this.

Treat valid JSON as a validated workflow result.
Make every field optional to avoid handling failures.
Allow unexpected fields silently when the workflow depends on a known contract.
Validate only the final answer after intermediate mistakes have already shaped the workflow.
Drop raw model output, validation errors, schema version, or source evidence.
Use schemas as a substitute for provenance, arbitration, evaluation, or human review.

Implementation options to test

Start with the smallest boundary that protects the workflow. Add provider-constrained generation, retries, and graph orchestration when the workflow actually needs them.

Need	Implementation options	What to evaluate
Python object contracts	Pydantic models, field constraints, validators, and `extra="forbid"`.	Whether bad model output fails loudly before it enters application state.
Provider-constrained output	OpenAI structured outputs or other schema-aware response formats.	Whether the provider can reduce parse failures while still letting your app validate business rules.
LLM client retries	Instructor or custom retry loops around typed outputs.	Whether retry behavior is observable, bounded, and tied to validation errors.
API and UI boundaries	FastAPI/Pydantic on the backend, JSON Schema or Zod-style contracts on the frontend.	Whether the same contract protects model output, API responses, and reviewer-facing UI state.
Release safety	Schema versions, stored validation failures, golden examples, and regression gates.	Whether a prompt, model, or schema change can be evaluated before it reaches users.

Where this shows up

This pattern appears anywhere model output needs to enter a workflow that other software, reviewers, or evaluators depend on.

P PolicyTrace

PolicyTrace uses typed extraction models with Instructor/Pydantic so document fields, citations, conflicts, and review state can move through the workflow as bounded objects.

C Future ContractCopilot

A contract system would need typed clauses, obligations, parties, dates, amendments, risk flags, and evidence references before review.

I Future invoice intelligence

An invoice workflow would need typed supplier identity, totals, tax lines, PO references, line items, and exception reasons.

The practical takeaway

Pydantic is not magic, and schemas do not make model output true. Their job is more practical: stop vague model output from becoming invisible application state.

A validated object is not the truth. It is a claim the system can inspect.

That distinction matters. Once the claim is typed, versioned, validated, and connected to evidence, the workflow can retry it, compare it, evaluate it, route it to review, or reject it.

Continue reading Read the orchestration setup, then connect schemas to evidence and evaluation.

This post sits inside the Orchestration Layer. The next natural layer is how typed outputs become evidence-backed, reviewable system state.

1,000-line prompt Evidence Evaluation PolicyTrace