Engineering8 min read

Efficiency Is a Design Decision, Not an Optimization

Jeff Toffoli

The industry treats efficiency as something you do after you build. You ship the product, you watch the metrics, you optimize. Caching here, a lighter model there, maybe you batch some queries.

But the biggest efficiency gains aren't optimizations. They're decisions about what not to build.

The Scaffolding Tax

Every AI company in the customer-facing agent space is building scaffolding. Sierra builds workflow composers called Journeys. Decagon builds Agent Operating Procedures that compile natural language into code. Intercom separates Content from Guidance from Custom Answers and adds priority rules to decide which one wins.

Each layer exists because the company doesn't trust the model to handle the full job. So they build infrastructure to constrain it, route around it, or supplement it.

That infrastructure costs energy in every dimension:

Compute energy. Multi-model routing means running a classifier to decide which model handles the query before the model that actually handles it. Knowledge graph ingestion means processing and indexing documents that the model could read directly from context. Fine-tuning means training runs that produce a model snapshot that starts depreciating the day it's created.

Human energy. Workflow builders need someone to build the workflows. Staging environments need someone to promote changes. Knowledge pipelines need someone to curate the sources. Each layer creates maintenance work that didn't need to exist.

Cost energy. Every layer has infrastructure costs. Every custom model has training costs. Every integration has engineering costs. The scaffolding doesn't just cost money to build — it costs money to keep running, and the cost compounds as the system grows.

Cognitive energy. Every abstraction a user has to understand is friction. "Content" vs "Guidance" vs "Custom Answers" — these are three names for the same thing (tell the AI what to do). The distinction exists because the system needs it, not because the user does.

The Alternative: Less Machine

We don't sanitize user messages. The model handles adversarial input better than our regex filter could. Removing the filter saved code, removed a failure mode, and improved reliability. Less machine.

We don't build workflow builders. The model reasons with tools and instructions. If the business owner wants different behavior, they change the instructions in plain language. No visual flowchart, no compiled procedures, no staging-to-production pipeline. Less machine.

We don't run Opus for a wellness coach that sends two-sentence texts about drinking water. Sonnet handles it perfectly. The decision to use the right-sized model isn't optimization — it's design. Less machine.

We don't separate "Knowledge Base" from "Guidance" from "Custom Answers." The owner writes instructions. The AI reads them. One text field, one concept, one thing to understand. Less machine.

Each of these decisions removes energy, cost, and human effort simultaneously. They're not tradeoffs. They're aligned — because the waste was never adding value in the first place.

The Bitter Lesson, Applied

Rich Sutton's bitter lesson in AI research: general methods that leverage computation always outperform special-purpose methods. The history of AI is the history of approaches that looked clever but couldn't compete with methods that simply used more compute on simpler architectures.

The customer-facing AI agent space is learning the same lesson in real time. Every model release makes scaffolding less necessary:

  • Multi-model routing? Claude handles the full range of queries. One model.
  • Knowledge graph ingestion? Context windows are large enough to include the relevant information directly. No preprocessing pipeline.
  • Fine-tuned models? General models now match or exceed fine-tuned performance on most tasks. No training runs.
  • Agent Operating Procedures? The model reasons about what to do next. No compiled flowcharts.

The companies that bet on scaffolding are fighting the model trajectory. Every improvement from Anthropic, OpenAI, or Google makes their custom infrastructure less valuable. They have to rebuild every 2-3 months to keep up — or accept that their proprietary layer is actively degrading relative to the general model underneath.

The efficient architecture is the one that improves when the model improves. One system prompt. One set of tools. One model. When Claude gets better, every agent on the platform gets better automatically. No rebuild. No retraining. No migration.

Efficiency and Capability Are the Same Direction

This is the part most people miss. They think efficiency means giving something up — using a cheaper model, cutting features, accepting lower quality.

But when you remove scaffolding, you don't lose capability. You gain it. The model is better at reasoning than your flowchart. The model is better at understanding instructions than your knowledge graph query engine. The model is better at handling edge cases than your routing rules.

The scaffolding wasn't adding capability. It was constraining it. Removing the scaffolding makes the system more capable AND more efficient at the same time.

This is why the practice model works. An independent practitioner with the right architecture can deliver the same capability as a 200-person company with $500M in funding — because the capability comes from the model, and the model is the same. The difference is the machinery around it. More machinery means more cost, more maintenance, more energy, more fragility. Less machinery means the opposite.

What This Means for Pricing

We haven't set pricing yet. That's intentional. Pricing should reflect the value of the outcome, not the cost of the machinery. And when the machinery is minimal, the margin between cost and value is wide.

A customer-facing AI agent that handles missed calls, books appointments, follows up with leads, and escalates to the owner costs fractions of a cent per conversation in model tokens. The Twilio SMS costs more than the AI. The real cost is Jeff's time configuring it, not the compute.

That margin is what makes the practice sustainable. Not volume pricing. Not per-seat licensing. Not usage-based metering that penalizes success. The economics work because the architecture is efficient, and the architecture is efficient because we trust the model instead of building around it.

The Principle

Efficiency isn't making the machine faster. It's having less machine.

Every layer you add is a layer you maintain. Every abstraction you create is an abstraction someone has to understand. Every model you fine-tune is a model you retrain. Every workflow you build is a workflow that breaks when the model changes.

The most efficient system is the one where the model does the work, the tools provide the actions, and nothing else exists. That's the design decision. Everything after that is just keeping it clean.

Stop losing jobs to missed calls

AI texts your missed callers back in 30 seconds. Real conversations, not templates. 14-day free trial — no card required.

Related Articles