Back to Blog
Building4 min readJune 17, 2026

Code Where You Can, LLM Where You Can't

The design rule that made my deal-screening agent both better and cheaper: use plain code for the exact work, and spend the model only on real ambiguity.

I spent weeks trying to teach plain code to tell whether 2 messy records described the same business. Different spellings, abbreviations, a DBA name on one and a legal name on the other, a stray LLC. I threw everything I had at it: fuzzy matching, string normalization, every trick I could find.

I could never get past about 80%.

That last 20% was a judgment problem, not something code could settle. "Bob's HVAC" and "Robert Smith Heating & Air" might be the same company or might be 2 competitors on the same street. No amount of normalization tells you which. You have to actually reason about it.

So I left the code in place and bolted an LLM on top for just the ambiguous cases. Basically asking it, in plain language, "are these the same thing?"

Solved. And cheap, because the model only sees the cases the code couldn't settle. The other 80% never touches it.

The rule I pulled out of that

Write code where you can, spend the model where you can't.

That split became how I build everything now. Anything that has to be exact lives in deterministic code. Math, dates, account numbers, thresholds, anything with a right answer. Code runs the same way every time, costs nothing per run, and never hallucinates.

The model gets the genuinely ambiguous work. The stuff that has no clean rule:

  • Reading a poorly written broker email and pulling out what they actually mean
  • Judging whether a clause in a contract is unusual
  • Summarizing what a customer is really asking for under the noise
  • Deciding whether 2 records are the same business

If your agent is doing arithmetic, you've already got a bug waiting. The model will give you a number. It just won't always give you the same number, and it has no idea when it's wrong.

How Marcus got better and cheaper at the same time

My deal-screening agent Marcus choked earlier this year. He was sloppy and expensive, and I couldn't trust his output without re-checking it, which defeats the point of having him.

The fix was exactly this surgery. I pulled the parts that needed to be exact out of the model's hands and into code, and left him only the judgment calls. He went from a poor performer to a trusted employee. The bill dropped too, because I stopped paying a frontier model to do long division.

The common failure mode is the opposite instinct, and I had it too at first. You get excited about the latest model and route everything through it, including the arithmetic. Then the invoice total gets creative. The model did its job. It just got handed the wrong one.

Boring is the feature

I think the mistake is treating the LLM as the product instead of one component in it. The model is the expensive, brilliant, occasionally unreliable part. You want it doing as little as possible, and only the part nothing else can do.

Everything around it should be predictable plumbing you can read, test, and trust.

A good agent is mostly boring code surrounding a well-supervised brain.