Coinbase has halved its AI costs amid explosive consumption growth: the secret lies in token engineering
Coinbase CEO Brian Armstrong shared an unexpected strategy: the company managed to cut artificial intelligence costs nearly in half, despite exponential growth in token consumption. The secret lies not in strict limits or bans, but in smart routing, caching, and default settings.
Armstrong explained that Coinbase engineers can choose any model, but default settings are crucial. The company is experimenting with using open-weight models by default, such as GLM 5.2 and Kimi 2.7, through an internal gateway. Notably, 91% of employees never hit their limits, so Coinbase switched to cheaper settings instead of lowering limits.
Routing, Cache, and Context Savings
In Coinbase's own systems, requests are pre-processed and routed to the most suitable model based on cache hits and cost. For example, a cutting-edge model is needed for planning but is overkill for execution. Model selection should be automated by AI itself, not by humans.
Armstrong placed special emphasis on caching. Missing cached data is the easiest way to drive up costs, so all requests in Coinbase are configured to reuse already processed information. In the LibreChat service, the cache hit rate rose from 5% to 60% after proper configuration.
Context savings also became an important factor. Armstrong advises starting new sessions when switching tasks, narrowly limiting file context, and disabling unused tools. The goal is not to spend fewer tokens, but to waste fewer of them.
Deutscher's "Barbell" Strategy
Analyst Miles Deutscher described a similar approach, calling it "token engineering" and proposing a "barbell" strategy to cut AI costs by 50% or more. He recommends entrusting the first 10% of work and project planning to the smartest models like Opus or GPT. The main 80% of routine work should be handled by a cheaper open-source model. The final 10% and result verification should again be assigned to high-level models. Deutscher has been using this scheme for several months and considers it the best way to reduce excessive AI spending.
Expert opinion: Coinbase's strategy is not just about savings, but a new standard for corporate AI. Dividing tasks between "heavy" and "light" models, combined with intelligent caching, allows scaling AI usage without proportional budget growth. This is a lesson for the entire industry: efficiency matters more than brute force.