Skip to content
ORSEN
All posts
Artificial Intelligence2 min read

Five mistakes to avoid when building an AI agent

POCs that never reach production share an architecture problem, not a model problem.

The demo looks great, everyone is excited. Then production: hallucinations, blowups, runaway costs. It's the same five mistakes, in the same order.

1. One mega-prompt that does everything

Pile ten jobs into one LLM call and each runs weaker. Snip → analysis → decide → write — break them into separate, well-scoped calls.

Practical test

If reading your prompt aloud leaves you breathless, it's too long.

2. Treating RAG as "embed everything, fetch nearest"

Naive RAG is the worst enemy. What you actually need:

  1. Hybrid retrieval: vector + BM25 keyword, combined.
  2. Reranker: pull top 50, drop to top 5 with a cross-encoder.
  3. Context packing: with citations and source metadata.

3. Shipping without evals

The only way to catch behavior drift is your eval set. 20–50 golden cases:

  • Tweak a prompt → run evals, never ship a regression.
  • Upgrade a model → run evals, verify it's actually better.
const cases = await loadEvalCases();
const results = await Promise.all(
  cases.map(async (c) => ({
    input: c.input,
    expected: c.expected,
    actual: await agent.run(c.input),
  }))
);
console.table(results.map(scoreOne));

4. Designing tool calling as "one endpoint, one tool"

An agent looking at 50 tools gets lost. Compose them: not find_customer but search_customers(filter, fields). Few, flexible.

5. Bolting human-in-the-loop on after the fact

If an approval layer isn't in the design from day one, you'll be writing if-trees forever. Build an explicit approval queue for material actions (send email, take payment, archive customer).

Cost warning

An agent without a control plane can burn the monthly API budget overnight. Rate limits + per-task budgets belong in the agent's infrastructure, not the software's.


Five mistakes, one theme: build the agent like software. Tests, monitoring, budgets, rollback. The discipline you apply to every other production system applies here too.

Related posts

LET'S BUILD

Make your next product the winning one.

A 30-minute discovery call lands you a clear scope, timeline and investment.