// News

The AI Stack Is Changing — Here's What Actually Matters

Three years ago, the practical vocabulary of applied AI was modest: fine-tune a model, wrap it in an API, ship. Today, building intelligently with AI means navigating a rapidly maturing stack — retrieval, orchestration, autonomy, commerce.

5 March 2026 ai ai-agents infrastructure

01 — Retrieval-Augmented Generation

Language models are powerful, but they are frozen in time and blind to your proprietary data. RAG — Retrieval-Augmented Generation — was the field’s pragmatic answer to this limitation. Instead of retraining or fine-tuning a model every time your knowledge base changes, you maintain an external store of documents and dynamically inject relevant excerpts into each prompt at inference time.

The architecture is deceptively simple: a user query is converted into a dense vector embedding, a nearest-neighbour search retrieves the most semantically similar chunks from a vector database (Pinecone, Weaviate, pgvector), and the retrieved context is prepended to the prompt before the model generates a response. The model never “knows” the document — it reasons over a curated excerpt.

RAG shifts the intelligence bottleneck from model weights to retrieval quality. The model is only as useful as the context you hand it.

What has matured

Early RAG pipelines treated retrieval as a solved problem — run a cosine similarity search, take the top-k results, done. Production experience quickly exposed the naivety of this approach. Chunks from a dense legal document or a multi-table database schema need different strategies than a flat FAQ.

The field has responded with several advances. Hybrid search blends dense vector retrieval with traditional BM25 keyword scoring, covering both semantic similarity and exact term matching. Reranking models (cross-encoders) apply a second pass over retrieved candidates to re-score against the original query — typically at the cost of some latency, but with a meaningful lift in precision. Contextual chunking strategies — overlapping windows, document-level summaries alongside fine-grained passages, hierarchical indexing — have become standard toolkit items.

For teams using PostgreSQL, the pgvector extension offers a low-friction path to adding vector search without introducing a new infrastructure dependency. For higher-throughput applications, dedicated vector stores with HNSW indexing offer significantly better query performance at scale.

Where the frontier sits

The current frontier is what practitioners are calling agentic RAG — rather than a single retrieval step, the model orchestrates multiple retrieval calls, reformulates queries when initial results are weak, and synthesises across heterogeneous sources. This blurs the boundary between retrieval and reasoning, and leads naturally into our next topic.

02 — Agentic AI

The transition from RAG to agentic AI is a transition from answering to acting. A RAG system returns text. An agentic system takes actions in the world: it browses the web, executes code, reads and writes files, calls APIs, spawns sub-agents, and loops until a goal is satisfied.

The canonical primitive is the tool-calling loop. The model is given a system prompt describing available tools (web search, a code interpreter, a database query function) and a high-level objective. On each iteration it decides whether to call a tool, inspects the result, updates its reasoning, and continues — or terminates with a final answer. The model is no longer a function; it is a process.

ReAct Pattern — Interleave reasoning traces with action calls. The model thinks aloud before each tool use, creating an auditable chain of thought that aids debugging and trust.

Plan-and-Execute — A planner model decomposes the goal into a task graph; worker agents execute individual steps. Reduces error propagation and enables parallelism.

Multi-Agent Systems — Specialised sub-agents (researcher, coder, reviewer) coordinate via a shared message bus. Each agent can have its own tools, memory, and system prompt.

Memory Layers — In-context (window), external episodic (vector DB), procedural (fine-tuned). Effective agents manage all three to avoid re-solving known problems.

The reliability problem

The central engineering challenge in agentic systems is reliability under uncertainty. A single LLM call has a failure mode you can evaluate and test. An agent running 20 tool-call iterations has a compounding failure probability that grows with each step. Small prompt ambiguities that would produce a slightly odd single response can snowball into catastrophic misaligned actions over a long trajectory.

The emerging answers here involve structured output schemas to constrain tool call parameters, human-in-the-loop checkpoints at high-stakes decision nodes, sandboxed execution environments for code tools, and investment in evals infrastructure — automated test harnesses that run agents against known scenarios and flag regressions. This is where the gap between demo and production remains widest.

The Model Context Protocol

A notable recent development is Anthropic’s Model Context Protocol (MCP) — an open standard for connecting AI models to external tools and data sources in a uniform way. Rather than every team writing bespoke integration code, MCP defines a common interface: a server exposes capabilities (resources, tools, prompts), a client (the model host) discovers and calls them. Think of it as USB-C for AI tool integrations. The ecosystem of MCP-compatible servers is growing rapidly, from database connectors to version control systems, and we expect it to become a de-facto infrastructure layer for serious agentic deployments.

03 — Agentic Commerce Platforms

Perhaps the most commercially consequential application of agentic AI is in e-commerce. The shopping journey — discovery, comparison, configuration, purchase, post-sale support — has historically been a sequence of discrete human-driven steps across fragmented interfaces. Agentic platforms are beginning to collapse this into a single conversational flow.

The shift is architecturally significant. Traditional e-commerce personalisation operates on rules and collaborative filtering applied passively at page-render time. Agentic commerce operates on intent, inferred in real time from natural language, and drives a dynamic, multi-step process on the user’s behalf.

What this looks like in practice

A user describes what they need — not by navigating category trees, but in plain language. The agent disambiguates intent through follow-up questions, executes a structured product search against catalogue APIs, compares results against stated constraints, surfaces a curated shortlist with explanatory reasoning, and can proceed to configure and initiate a transaction — all without the user leaving the conversation.

The catalogue and the checkout are becoming API endpoints. The interface layer is a conversation. The intelligence layer is an agent.

Emerging platform capabilities

Several commercial platforms and frameworks are building explicitly for this model. Vendors are exposing commerce-native tool schemas — structured APIs for inventory queries, availability checks, cart operations, and order management that agents can call reliably. On the AI side, shopping-specialised models fine-tuned on product data and commercial intent signals are beginning to appear alongside general-purpose models.

The deeper integration challenge is trust and delegation. Users are comfortable asking an agent for recommendations. They are less uniformly comfortable with an agent completing a purchase on their behalf. The design of confirmation checkpoints, budget guardrails, and reversibility primitives — the ability to easily undo an agent’s action — will likely determine the pace of adoption as much as the underlying model capability.

Implications for platform builders

If you are building or maintaining a commerce platform today, the architectural question worth asking is: how would an AI agent interact with my system? APIs designed for human browser sessions — paginated HTML, session cookies, CAPTCHA-gated flows — are not agent-friendly. Investing now in clean, structured, programmatic APIs for your core commerce operations is both a technical necessity for agent integration and generally good engineering practice.

04 — The Convergence

These three developments — RAG, agentic orchestration, and agentic commerce — are not separate trends. They are layers of a single stack converging toward the same destination: AI systems that can reason over real-world knowledge, take goal-directed action across real-world tools, and operate within real-world commercial contexts.

RAG provides the knowledge retrieval layer. Agentic orchestration provides the action and reasoning layer. Domain-specific applications — commerce, enterprise workflow, developer tooling — provide the context in which those capabilities produce business value. Organisations that invest in all three layers, and in the infrastructure to connect them reliably, are building durable advantage.

The engineering challenges that remain are real: reliability at scale, evaluation and testing, safe delegation of authority, latency under complex multi-step workflows. But the trajectory is clear. The question for engineering teams is not whether to engage with this stack, but how to engage with it thoughtfully — with appropriate investment in evals, guardrails, and infrastructure — rather than treating it as a prototyping playground.

// Share this post

X / Twitter LinkedIn Bluesky Facebook Threads Reddit

← Back to blog