Reliable Interfaces for LLMs

Prompt Contracts for Reliable Agents

March 14, 2026 · AI

Most prompt failures are not language failures. They are interface failures.

Teams describe an agent with fuzzy goals like "be helpful," "use tools when needed," and "act like a senior engineer." Then they act surprised when the system becomes inconsistent across edge cases. That is the software equivalent of deploying an API with no schema.

Prompts Should Behave Like Contracts

A production prompt needs the same clarity we expect from any interface:

What inputs are guaranteed to exist.
What the model may assume versus what it must verify.
When a tool call is required before answering.
What output shape is acceptable.
What failure behavior is preferred when information is missing.

If those rules live only in the developer's head, the agent has no stable operating boundary.

Separate Policy from Persona

Persona text is the least important part of most agent prompts. Policy text is what matters.

Policy:
- Verify time-sensitive claims before answering.
- Never invent tool results.
- Ask for clarification only when a wrong assumption is risky.

Persona:
- Direct
- Calm
- Technical

When persona overwhelms policy, the model sounds confident while doing the wrong thing. That is worse than a dry answer that is correct.

Define Tool Preconditions Explicitly

Do not write "use the web if necessary." Write the condition.

Use search when:
- the answer may have changed recently
- the user asks for latest information
- a specific URL or paper is referenced
- a mistake would create financial, legal, or medical risk

Concrete trigger conditions reduce variance. They also make evaluation possible because reviewers can score whether the agent followed the contract.

State the Failure Mode You Want

An unreliable agent is often one that is punished implicitly for uncertainty. If every hesitant answer feels like failure, the model learns to bluff.

Tell it what to do instead:

Say what is known.
Flag the missing piece.
Take the next safe action, such as searching or asking one short question.

Contract Drift Is Real

Prompts rot the same way APIs rot. More constraints get appended. Old rules contradict new ones. Tool descriptions evolve while the system prompt stays frozen.

Review prompts as interfaces. Version them. Remove dead rules. Merge duplicates. Test the contract with adversarial inputs, not just happy-path demos.

Reliable Agents Are Boring on Purpose

The goal is not a prompt that feels magical. The goal is a prompt that behaves predictably when the request is ambiguous, the context is incomplete, and the clock is running.

That is what contracts are for. They turn prompting from folk art into systems engineering.

← Back to Home