Field notes for engineers building agentic software.
Essays on multi-agent delivery, security failures, and the architectural trade-offs that
matter when systems leave the demo stage.
Showing 16-30 of 94 posts. Search and filters apply to this page.
7 May 2026 News
OpenAI's GPT-5.5 Instant Is Really a Memory Product
OpenAI's new default ChatGPT model is pitched as faster and less hallucinatory, but the more important shift is how much of the product now depends on memory, sources and model replacement discipline.
MCP's 97 million installs show agent infrastructure is settling around boring standards
Anthropic's Model Context Protocol has reportedly reached 97 million monthly SDK installs. The number matters less than the direction: agents need a shared integration layer.
Stripe’s new Link wallet for AI agents looks like a payments story. It is really a trust and identity story, and that makes it more important than most agent shopping demos.
NVIDIA’s Nemotron Push Is a Bet That Agents Need Fewer Models, Not More
NVIDIA’s new Nemotron 3 Nano Omni model is notable less for the benchmark claims than for the architectural argument behind it: multimodal agents become more useful when perception is consolidated instead of stitched together from separate models.
Google Wants Research Agents to Be Infrastructure, Not a Chat Tab
Deep Research Max matters less as a flashy research assistant and more as a signal that Google wants long-running agent work to sit inside a proper application runtime.
OpenClaw Security: Tokens, Plugins, and Safe Configuration
OpenClaw gets useful the moment it can touch real systems. That is also the moment you need to stop treating configuration as a quick setup chore and start treating it like production security.
Cursor at $50B: The IDE Is Dead, Long Live the Control Plane
Cursor is raising $2B at a $50B+ valuation while simultaneously rebuilding its product around agent orchestration. The IDE is becoming a fallback. That's the real story.
Kimi K2.6: The Open-Source Model That Makes Claude Look Expensive
Moonshot AI released Kimi K2.6 with 80.2% on SWE-Bench Verified, 256K context, native video input, and an 88% cost advantage over Claude Opus 4.7. It's not better than Opus, but for production coding workloads, it might not need to be.
Google Thinks Agents Need Their Own Enterprise Stack
Google's Gemini Enterprise Agent Platform is an attempt to turn agents from a model feature into managed enterprise infrastructure. The comparison with OpenClaw, NemoClaw, Hermes and Microsoft 365 Agents shows just how quickly the market is splitting into distinct layers.
Anthropic launched Claude Design this week - a visual creation tool that generates websites, landing pages, and presentations from prompts. It's not another AI feature in an existing design tool. It replaces the starting point.
Anthropic Labs ships Claude Design: a visual design tool powered by Opus 4.7 that generates prototypes, slides, and marketing collateral from natural language. The design tool market just got more crowded.
Windsurf 2.0 - When Your IDE Becomes a Control Tower
Windsurf 2.0 ships an Agent Command Center and native Devin integration. The IDE is no longer where you write code. It's where you direct the agents that write code.