A year ago, AI agents felt like a research curiosity. Today, we're deploying multi-step agentic workflows in production for clients across finance, operations, and customer experience. The shift has been fast, and the early results are compelling — but there are genuine pitfalls worth understanding before you commit.
What Makes an Agent Different from a Chatbot
A chatbot responds. An agent acts. Give an agent a goal and it will break the goal into steps, use tools to execute each step, evaluate the results, and decide what to do next — looping until the task is complete or it hits a defined boundary. The autonomy is the value proposition and, managed poorly, the risk.
Where Agents Are Working in Production Today
The most reliable production use cases we've deployed share common traits: bounded scope, clear success criteria, access to reliable tools, and a human review step before any irreversible action. Invoice processing agents that extract data, match to POs, flag discrepancies, and draft payment approvals. Research agents that gather competitive intelligence, synthesise it, and produce briefing documents. Support triage agents that classify inbound queries, retrieve relevant knowledge, and draft responses for human review.
The Reliability Problem Is Real
Agents fail in non-obvious ways. A single tool call failure can cascade into a meaningless result several steps later. Prompt injection — where content retrieved by the agent attempts to redirect its behaviour — is a live security concern. Agents can also get stuck in loops or take unexpected paths through their tool set. Robust production agents need retry logic, explicit step limits, anomaly detection, and comprehensive logging of every decision and tool call.
The Right Architecture
Don't try to build one agent that does everything. The most reliable pattern we use is a supervisor agent that orchestrates a team of specialised sub-agents, each with a narrow tool set and a well-defined role. This limits blast radius when something goes wrong and makes debugging tractable. Each sub-agent is independently testable, which is essential for maintaining confidence in the system over time.