// News

OpenAI's Agents SDK Gets Sandboxes and a Harness — Finally

OpenAI updates its Agents SDK with sandboxing and an in-distribution harness for frontier models. Here's why that matters and what's still missing.

16 April 2026 ai-agents openai sdk safety

OpenAI’s Agents SDK Gets Sandboxes and a Harness — Finally

OpenAI just shipped a meaningful update to its Agents SDK: sandboxed execution environments and an in-distribution harness for frontier models. If you’ve been building production agents, you know this isn’t a nice-to-have — it’s table stakes.

What’s new

The two headline features are straightforward:

Sandboxing. Agents can now operate in controlled computer environments, accessing only the files and tools approved for a particular operation. This is the “don’t let the agent delete your production database” feature, and honestly it should have been there from day one. Running unsupervised agents without sandboxing is the kind of thing that makes security engineers wake up in a cold sweat.

In-distribution harness. This is the more interesting one. The harness — everything around the model that handles tool calls, file access, and execution — is now bundled and tested alongside the frontier models it’s designed to work with. You get a known-good combination of model + execution environment, tested together, rather than cobbling together your own harness and hoping it works.

Karan Sharma from OpenAI’s product team described the goal as enabling “long-horizon agents” — the complex, multi-step tasks where agents need to operate for extended periods without human babysitting. That’s the hard problem. Anyone can build a single-shot agent. Keeping one running reliably for 50 steps is where things get ugly.

Why this matters

The agent SDK space is crowded right now. Anthropic has its own tool use and agent patterns. Google’s been pushing Gemini’s function calling. Smaller players like LangChain and CrewAI are fighting for mindshare. What separates the serious players from the toy builders is how they handle the unsexy infrastructure problems: sandboxing, error recovery, state management, and safe execution boundaries.

OpenAI is acknowledging that the model is only part of the equation. The harness matters just as much. If you’ve ever debugged an agent that got stuck in a loop because its tool call format drifted slightly from what the model expected, you know exactly what I’m talking about.

What’s still missing

Python only, for now. TypeScript support is “planned for a later release,” which is OpenAI’s way of saying “not yet.” Given how much of the agent ecosystem runs on TypeScript — including Anthropic’s Claude Code — this is a real gap. The MCP protocol, Anthropic’s plugin architecture, and most of the tooling around agent orchestration are JavaScript-first. OpenAI is arriving at the party speaking the wrong language.

Code mode and subagents are also on the roadmap but not yet available. Subagents in particular are critical for the composable agent stacks that are starting to emerge. The pattern of one orchestrator agent delegating to specialist subagents is becoming the default architecture for non-trivial work. Without it, the SDK is limited to single-agent patterns.

The bigger picture

This update is OpenAI playing catch-up with where the market already is. Anthropic’s Claude Code has had robust tool use and agent patterns for months. The Cursor/Claude Code/Codex stack that’s emerging isn’t waiting for any single provider’s SDK to bless their architecture. OpenAI is building the infrastructure that should have existed when they launched the SDK, and they’re doing it reactively.

That said, getting sandboxing right is genuinely hard. The fact that OpenAI is shipping it as a first-class feature rather than leaving it as an exercise for the developer is a signal that they’re taking production agent deployment seriously. The harness integration is similarly non-trivial — testing model + execution environment as a unit rather than assuming they’ll work together is the kind of boring, important work that prevents 3am incidents.

Bottom line

If you’re building agents on OpenAI models, this update makes your life better. Sandboxing protects you from yourself. The harness gives you a tested, supported execution path. Both are things you’d end up building yourself anyway, and now you don’t have to.

But if you’re choosing a platform today based on where the ecosystem is going, the Python-only limitation and the missing subagent support are real constraints. The agent space is moving fast, and “planned for a later release” doesn’t ship code.