Why Most Agentic AI Systems Fail — And How to Fix Them

Lessons from the trenches on building LLMs that don't flake out.

Nina Johansson|Jun 21, 2026, 03:02 PM|Source: Hacker News

Why Most Agentic AI Systems Fail — And How to Fix Them

Photo by Ann H on Pexels

Last Thursday, I watched an AI agent order 47 rubber ducks for a law firm's office. The prompt was simple: 'Order office supplies.' The system interpreted 'supplies' as 'supplies for the office duck pond' — a pond that doesn't exist. This is the state of agentic AI in 2026: brilliant one minute, baffling the next.

Silicon Valley is drunk on 'agents' — autonomous systems that plan, reason, and act. Every startup from Palo Alto to Shenzhen is shipping them. But anyone who's actually deployed one in production knows the dirty secret: they're unreliable. They hallucinate, get stuck in loops, and execute tasks with the consistency of a caffeinated intern.

Martin Fowler's recent piece on building reliable LLM-based agents cuts through the hype. It's a cold shower for an industry that needs one.

Why Agents Fail — The Three-Body Problem

Agentic systems fail in three predictable ways. First, context drift. The agent starts with a clear goal, but as it takes actions and processes results, it forgets what it was doing. It's like calling tech support, being transferred six times, and ending up in billing when you wanted to cancel.

Second, tool misuse. LLMs are terrible at knowing which tool to use when. They'll call a database query when a simple search would do — or worse, try to send an email via a PDF generator. The result: chaos.

Third, failure to recover. When something goes wrong — an API returns a 500, a calculation is off — the agent either blunders ahead or collapses entirely. No self-correction. No fallback. Just a log full of errors.

“The median agent in production today has the reliability of a 12-year-old asked to plan a wedding.” — Martin Fowler, paraphrased but true.

The Oversight Trap

The industry's answer to unreliability is 'human in the loop.' Every action gets a thumbs-up before execution. This works for about five minutes. Then you realize you've just hired a babysitter for a digital teenager. The agent can't act autonomously — so why build it at all?

Real reliability means building systems that can self-correct. That means giving agents feedback loops: a verification step after every action. Did the email send? Was the calculation right? If not, retry or escalate — but don't just sit there blinking.

What Working Systems Do Differently

Fowler's article highlights a few patterns that actually work. One is structured outputs. Instead of letting the LLM write free-form responses, force it into templates. JSON schemas, locked-down function calls, rigid formats. It's boring. It works.

Another is state machines. The agent isn't a free-wheeling chatbot. It's a finite state machine with defined transitions: gather input → validate → act → confirm. If it tries to skip a step, the system stops it.

And testing. Not just unit tests — adversarial tests. Feed the agent nonsense inputs. Give it contradictions. See if it catches them. Most won't. The ones that do are worth deploying.

The Cost of Reliability

Building reliable agents isn't cheap. It means more engineering, more testing, more latency. The trade-off is between 'fast and flaky' and 'slow and solid.' For most business applications — order processing, customer support, compliance — solid wins every time.

But the market rewards speed. So we get half-baked agents that ship in weeks and fail in hours. The winners will be the teams that treat reliability as a feature, not an afterthought.

My Take

I've been covering tech long enough to recognize a bubble within a bubble. The agentic AI boom feels like crypto in 2021: everyone's in a hurry to build something that no one has proven they need.

Fowler's piece is a corrective. It says: slow down. Test your assumptions. Build for failure. The agents that last won't be the ones that move fastest — they'll be the ones that don't break when the real world hits them.

Because the real world has rubber ducks. And it's full of supply closets that don't have ponds.

#agentic-ai#llm-reliability#ai-engineering#martin-fowler

分享到:X f WB

Why Most Agentic AI Systems Fail — And How to Fix Them

Why Agents Fail — The Three-Body Problem

The Oversight Trap

What Working Systems Do Differently

The Cost of Reliability

My Take

相关推荐

Zuckerberg’s $299 Glasses: A Cheap Bet on a Wearable Future That’s Still Hazy

Oracle Axed 21,000 Jobs in One Year — And Blamed AI for Every One

Valve Blames Brutal RAM Market for Steam Machine's $1,049 Price Tag

Trending Now

相关推荐

Zuckerberg’s $299 Glasses: A Cheap Bet on a Wearable Future That’s Still Hazy

Oracle Axed 21,000 Jobs in One Year — And Blamed AI for Every One

Valve Blames Brutal RAM Market for Steam Machine's $1,049 Price Tag