Code Where You Can, LLM Where You Can't
The design rule that made my deal-screening agent both better and cheaper: use plain code for the exact work, and spend the model only on real ambiguity.
I spent weeks trying to teach plain code to tell whether 2 messy records described the same business. Different spellings, abbreviations, a DBA name on one and a legal name on the other, a stray LLC. I threw everything I had at it: fuzzy matching, string normalization, every trick I could find.
I could never get past about 80%.
That last 20% was a judgment problem, not something code could settle. "Bob's HVAC" and "Robert Smith Heating & Air" might be the same company or might be 2 competitors on the same street. No amount of normalization tells you which. You have to actually reason about it.
So I left the code in place and bolted an LLM on top for just the ambiguous cases. Basically asking it, in plain language, "are these the same thing?"
Solved. And cheap, because the model only sees the cases the code couldn't settle. The other 80% never touches it.
The rule I pulled out of that
Write code where you can, spend the model where you can't.
That split became how I build everything now. Anything that has to be exact lives in deterministic code. Math, dates, account numbers, thresholds, anything with a right answer. Code runs the same way every time, costs nothing per run, and never hallucinates.
The model gets the genuinely ambiguous work. The stuff that has no clean rule:
- Reading a poorly written broker email and pulling out what they actually mean
- Judging whether a clause in a contract is unusual
- Summarizing what a customer is really asking for under the noise
- Deciding whether 2 records are the same business
If your agent is doing arithmetic, you've already got a bug waiting. The model will give you a number. It just won't always give you the same number, and it has no idea when it's wrong.
How Marcus got better and cheaper at the same time
My deal-screening agent Marcus choked earlier this year. He was sloppy and expensive, and I couldn't trust his output without re-checking it, which defeats the point of having him.
The fix was exactly this surgery. I pulled the parts that needed to be exact out of the model's hands and into code, and left him only the judgment calls. He went from a poor performer to a trusted employee. The bill dropped too, because I stopped paying a frontier model to do long division.
The common failure mode is the opposite instinct, and I had it too at first. You get excited about the latest model and route everything through it, including the arithmetic. Then the invoice total gets creative. The model did its job. It just got handed the wrong one.
Boring is the feature
I think the mistake is treating the LLM as the product instead of one component in it. The model is the expensive, brilliant, occasionally unreliable part. You want it doing as little as possible, and only the part nothing else can do.
Everything around it should be predictable plumbing you can read, test, and trust.
A good agent is mostly boring code surrounding a well-supervised brain.
Keep Reading
I Cancelled 5 Apps and Built My Own Wealth Tracker in an Afternoon
I cancelled 5 finance subscriptions and built one wealth tracker in an afternoon to handle a dozen entities and a stack of K-1s. The build-vs-buy line is moving, and here's the catch nobody mentions.
I Built My Chief of Staff Out of Markdown Files
My personal AI chief of staff runs on a folder of markdown files and a Telegram bot. Here's how the second brain is actually wired, and why it buys back my attention.
Get New Posts
Notes on building, in your inbox.
Occasional writing on AI systems, acquisitions, and buying back your time. No spam, unsubscribe anytime.
Go Deeper
Want to talk through this?
I do strategy calls on OpenClaw, AI operations, and building autonomous systems.