Six Reliability Primitives for LLM Agents

Reliability concerns for LLM agents are typically bundled into one heavy framework that asks you to adopt prompting, tool routing, and runtime governance as a single dependency. Production teams want them à la carte. They want small primitives they can drop in around existing tool calls without buying into a new programming model.

That observation is the design centre of agent-stack: six small, single-concern reliability libraries published independently to npm, PyPI, and the Model Context Protocol registry. Each library is zero-dependency, under 500 lines of code, and addresses one specific failure mode that production agent teams have to handle.

This post is a tour of the six primitives, the cross-cutting invariants they enforce, and the trade-offs of "composable by inclusion" instead of "composable by framework."

The six primitives

Library	Concern	Failure mode it addresses
AgentFit	Context-window fitting	Token-aware truncation. Pluggable tokenizers for OpenAI / Anthropic / open models.
AgentGuard	Network egress allowlist	Blocks "agent suddenly POSTs PHI/secrets to attacker.com" at the network layer.
AgentSnap	Snapshot tests for tool calls	Catches silent regressions when a model's tool-call shape changes after a deploy.
AgentVet	Tool-arg validation	Throws `ToolArgError` with an LLM-friendly retry hint, so the next turn can self-correct.
AgentCast	Structured-output retry	BYO-LLM JSON validator + retry loop for malformed responses during brown-outs.
AgentBudget	Token + dollar caps	Prevents runaway loops billing $1000 on a single user query.

Three runtimes per primitive

Every library ships in three runtime forms:

# TypeScript (npm)
npm i @mukundakatta/agentvet @mukundakatta/agentguard @mukundakatta/agentbudget

# Python (PyPI)
pip install agentvet agentguard agentbudget

// MCP server (Claude Desktop config)
{
  "mcpServers": {
    "agentvet": { "command": "npx", "args": ["-y", "@mukundakatta/agentvet-mcp"] },
    "agentguard": { "command": "npx", "args": ["-y", "@mukundakatta/agentguard-mcp"] }
  }
}

The Python and TypeScript surfaces are 1:1 by design — a Python team and a TypeScript team can interoperate on the same primitives. The MCP variant lets a remote LLM use the primitive as a tool.

Composable by inclusion, not by framework

Every primitive has the same shape: a single class or function as the public surface, a typed error carrying retry-friendly context, and an opt-in automatic adapter for popular provider response shapes. Nothing depends on anything else.

Compare with framework-style libraries that bundle reliability concerns: you get them all or none, you adopt their orchestration model, you migrate to their abstractions. agent-stack inverts that.

Why this matters in practice

Healthcare amplifies every reliability concern. A FHIR-querying agent has to be defensive about: only calling sanctioned endpoints, never leaking PHI in logs, abstaining when the right tool is not on the list, validating that a patient_id looks like a real FHIR id, never exceeding a budget on a single patient query, and producing structured outputs the downstream system can parse.

The same pattern applies to any production setting where the cost of an agent silently failing is higher than the cost of a slightly-too-defensive primitive: financial services, internal corporate tools, customer support automation, anything billable.

Artifact paper

The full design rationale, the cross-cutting invariants, and the operational questions that emerge when reliability is split across many small dependencies are documented in a peer-reviewable artifact paper:

Zenodo DOI: 10.5281/zenodo.20074702
HuggingFace Hub: huggingface.co/mukunda1729/agent-stack (HF DOI 10.57967/hf/8720)
Source: github.com/MukundaKatta/agent-stack

The paper is currently under review at the ASE 2026 Tools track. Source for every library is archived in Software Heritage. Every package has CI on GitHub Actions, snapshot tests, and a CITATION.cff.

What's next

The natural next primitive is AgentTrace for cost and latency telemetry. After that, a combined @mukundakatta/agent-stack meta-package that imports all six with sensible defaults for production agents.

If you ship LLM agents in production and you want one of the primitives without buying into a framework, install only the library you need. Each one stands alone.

Source on GitHub. Paper on Zenodo. DOI on HuggingFace. Six primitives, three runtimes, one unified surface for production reliability.

Six Reliability Primitives for LLM Agents

The six primitives

Three runtimes per primitive

Composable by inclusion, not by framework

Why this matters in practice

Artifact paper

What's next

Comments

More from this blog

Most teams running prompt-cached LLM pipelines have no idea what their cache is actually saving them. cachebench tells you.

Most teams find out their RAG pipeline is broken from a complaint. driftvane tells you first.

Agent Memory Is Data Infrastructure: A Hermes Plugin That Takes Deletion Seriously

10 Agentic AI Trends Developers Should Watch in 2026

Command Palette

The six primitives

Three runtimes per primitive

Composable by inclusion, not by framework

Why this matters in practice

Artifact paper

What's next

Comments

More from this blog