Harness Engineering — REALM Framework

The question most people ask when they start working with AI agents is: which model should I use? It is the wrong question. The part that determines whether your agent is reliable, safe, and genuinely useful is everything you build around the model. That is the harness.

The Question Everyone Gets Wrong

The question most people ask when they start working with AI agents is: which model should I use?

It is the wrong question. Not because the model does not matter — it does — but because the model is the least interesting part of what makes an agent actually work. The part that determines whether your agent is reliable, safe, purposeful, and genuinely useful is everything you build around the model. That is the harness.

Harness engineering is the craft of designing and building the scaffolding that surrounds a raw language model — the system prompt, the context it receives, the tools it has access to, the permissions it operates under, the constraints it cannot cross, and the memory it can read and write. None of that comes from the model. All of it comes from you.

Most people are already doing harness engineering. They just do not know that is what they are doing, so they do it without intention — and it shows.

What a Harness Actually Is

Think of a raw language model as something like a very capable generalist who has read most of the world's text. They are intelligent. They can reason, write, analyze, and adapt. But they do not know anything about your business. They do not know their role. They do not know what they are allowed to do. They do not know where to store what they find. They have no memory of what happened yesterday. And unless you have given them specific tools, they cannot do anything beyond generating text.

The harness is the answer to all of those gaps.

A well-engineered harness tells the agent who they are — their role, their personality, their domain of knowledge, their way of thinking. It tells them what they know about the business — the context they need to operate meaningfully. It tells them what they can do — the tools and permissions they have been granted. It tells them what they cannot do — the hard limits they cannot cross. And it gives them a way to read and write to a shared memory so their work persists across sessions.

Without the harness, you have a model. With the harness, you have an agent. The difference is entirely in what you built around it.

REALM Is a Harness Framework

This is not abstract for me. Every Character in my Realm has a harness. I just call the components by different names.

The SOUL file is the core of the harness. It is the system prompt that defines who the Character is — their Class, their role, their knowledge access, their personality, how they think, what they decide alone, and what they bring to the Player. When I deploy a Hunter to research a new market, the model doing the work is one piece. But it is the SOUL file that turns that model into Loge — an intelligence scout who moves into unknown territory, reports raw findings without editorializing, and writes everything back to the Codex.

The Codex is what gives the harness memory. Without the Codex, each session starts from zero. With it, a Character can read the full context of an ongoing Quest, understand what other Characters have already done, and pick up exactly where the last session left off. The Codex is not a nice-to-have. It is the memory layer of every harness in the Realm.

The Autonomy Tiers are the permission layer. They define what each Character is allowed to do autonomously and where the system requires Player approval before anything reaches the outside world. Tier 3 — the tier where actions have real consequences — is the critical layer of the harness, and as I wrote in a previous post, it must be enforced at the platform level, not just written into the SOUL file.

Tool access is the fourth layer. A Bard who writes content does not need access to your financial data. A Hunter who researches competitors does not need the ability to send emails. Minimum necessary access is a harness principle as much as it is a security principle. Every permission you do not grant is a failure mode that does not exist.

These four layers — identity, memory, permissions, and tools — are the harness. REALM gives you a structured way to engineer them for every Character in your crew.

The Industry Is Catching Up to This

For a long time, the idea of writing structured context for AI agents was something individual practitioners figured out on their own. In August 2025, it became a formal open standard.

AGENTS.md — now under the Linux Foundation's Agentic AI Foundation alongside Anthropic's Model Context Protocol — is a simple Markdown convention for giving AI agents persistent, project-specific guidance. It covers build commands, coding conventions, naming rules, testing expectations, and explicit boundaries. By the end of 2025, more than sixty thousand open source repositories had adopted it.

The AGENTS.md standard matters because it validates something REALM has been built on from the start: agents need written context to function reliably, that context should be human-readable and machine-readable at the same time, and the discipline of writing it down is what separates a functional agent from a random one.

REALM's SOUL file does for business agents what AGENTS.md does for coding agents. The principle is identical — write down who the agent is, what they know, how they should operate, and where the limits are. The implementation differs because the domain differs. The engineering discipline is the same.

If you are already thinking about AGENTS.md in your technical work, you already understand harness engineering. The question is whether you are applying that same discipline to the rest of your agents — the ones with access to your calendar, your communications, your content pipeline, and your business data.

Where Most Harnesses Break

There are a few places I see harnesses fail consistently, and they are all predictable.

Identity drift. A SOUL file written in one sentence does not anchor an agent. When the identity layer is thin, the model fills in the gaps with its own defaults — which may or may not match what you need. A Character with a weak SOUL file will behave differently across sessions, drift when asked unusual questions, and collapse under pressure. The identity layer needs to be substantial enough to hold.

Context amnesia. Running an agent without a shared memory layer means every session is session one. The agent cannot build on prior work, cannot maintain awareness of what is happening across the Realm, and cannot be trusted with anything that requires continuity. The Codex solves this. Without something like it, the harness is broken at the memory layer.

Permission bloat. Starting by giving agents broad access and tightening later is backwards. The time to start tight is before the first session, not after something goes wrong. Permissions should be granted one at a time, for specific reasons, when there is a clear need. This is not caution for its own sake — it is good harness engineering.

Confusing the SOUL file with the platform constraint. Writing "always ask before publishing" in a system prompt is not the same as enforcing it at the platform level. A constraint written only in a SOUL file can be overridden. A constraint enforced by the platform cannot. Both layers matter, and they serve different functions. The SOUL file shapes behavior. The platform controls what is actually possible.

A harness built on good intentions but weak implementation is not a harness. It is a suggestion. And an agent that can ignore its suggestions eventually will.

The Craft Nobody Talks About

When people discuss AI agents, they talk about models. They compare benchmarks, context windows, and pricing. They debate which provider to trust. Almost nobody talks about what they are building around the model — even though that is where the real work is.

Harness engineering is not glamorous. It does not come with a launch announcement or a benchmark score. It is the quiet work of writing SOUL files that actually hold, structuring Codex zones that give agents the right context at the right moment, and defining permission boundaries that are enforced where they need to be enforced.

It is also the work that determines whether your Realm functions or falls apart. A strong harness turns a capable model into a trustworthy Character. A weak harness turns even a powerful model into something you cannot rely on.

The model is what you choose. The harness is what you build. Build it with the same care you would give to any other part of your company — because it is the part that everything else runs on.

You Are Not Using AI Agents. You Are Engineering Harnesses.

The Question Everyone Gets Wrong

What a Harness Actually Is

REALM Is a Harness Framework

The Industry Is Catching Up to This

Where Most Harnesses Break

The Craft Nobody Talks About