Coinbase has halved its AI spending: the paradox of token consumption growth

28.06.2026

17:37

Coinbase CEO Brian Armstrong shared an unexpected company strategy: despite exponential growth in compute token consumption, AI costs were cut in half. The secret, he says, lies not in banal restrictions or limits, but in fine-tuning the infrastructure.

Not Limits, but Smart Routing

Armstrong explained that Coinbase engineers can choose any AI model, but default settings play a key role. The company experiments with using open models, such as GLM 5.2 and Kimi 2.7, through an internal gateway. Notably, 91% of employees never hit limits, so the company switched to cheaper configurations without lowering thresholds.

The basis of the savings is intelligent request routing. The system automatically directs a task to the most suitable model, considering caching and cost. For example, a flagship model is used for strategic planning, while a lighter, cheaper one handles routine execution. Armstrong emphasizes that the model selection should be automated by the AI itself, not a human.

Cache and Context: Two Pillars of Savings

The role of caching is highlighted separately. Cache misses are the easiest way to skyrocket costs. At Coinbase, all requests are configured to reuse already processed information. In the LibreChat service, the cache hit rate rose from 5% to 60% after proper configuration.

Context savings are equally important. Armstrong advises starting new sessions when changing tasks, narrowly limiting file context, and disabling unused tools. The goal is not to spend fewer tokens, but to avoid wasting them. It is this approach that allowed Coinbase to nearly halve costs while consumption continues to grow.

Deutscher's "Barbell" Strategy

Analyst Miles Deutscher describes a similar methodology, calling it "token engineering." He proposes a "barbell" strategy to reduce AI costs by 50% or more. The first 10% of work and project planning should be entrusted to the smartest models (Opus, GPT), as this is a critical stage. The bulk 80% of routine tasks should be performed by a cheaper open-source model. He recommends assigning the final 10% and result verification back to high-level models. Deutscher has been using this scheme for several months and considers it the best way to reduce excessive AI spending.

Expert opinion: Coinbase's experience is a clear example that the efficiency of AI infrastructure is determined not by the volume of investment, but by the architecture of its use. For the crypto industry, where every cent counts, smart routing and caching become not just an option, but a necessity. This is a lesson for all projects seeking to scale AI without exorbitant costs.

Crypto news