Coinbase has halved its AI spending amid explosive growth in token consumption: the secret is not in the limits

28.06.2026

15:56

Coinbase CEO Brian Armstrong shared details on how the exchange managed to cut artificial intelligence costs by nearly half, despite exponential growth in token consumption. The key to success lies not in strict restrictions and budget limits, but in smart infrastructure tuning.

Armstrong directly stated that Coinbase engineers can choose any AI model, but default settings are crucial. The company is experimenting with using cheaper open-weight models by default, such as GLM 5.2 and Kimi 2.7, through an internal gateway. Notably, 91% of employees never hit the set limits, so Coinbase opted to optimize default parameters rather than reduce quotas. This allowed not only to curb cost growth but also to reverse it.

Routing, Caching, and Context Savings

In Coinbase's own systems, requests are pre-processed and routed to the most suitable model based on cache hit probability and cost. For example, an advanced model is necessary for planning but excessive for routine execution. Armstrong emphasizes that model selection should ultimately be automated by AI itself, not by humans.

Special attention is given to caching. Cache misses are the easiest way to drive up costs. At Coinbase, all requests are configured to reuse already processed information. In the LibreChat service, the cache hit rate increased from 5% to 60% after proper tuning.

Context savings are also critically important. Armstrong advises starting new sessions when switching tasks, narrowly limiting file context, and disabling unused tools. The goal is not to spend fewer tokens, but to minimize wasteful token usage. It is this comprehensive approach that allowed Coinbase to nearly halve AI costs amid continued consumption growth.

Deutscher's "Barbell" Strategy: 10-80-10

Analyst Miles Deutscher describes a similar approach, calling it the era of "token engineering." He proposes a "barbell" strategy to reduce AI costs by 50% or more. He recommends trusting the first 10% of the work (project planning) to the smartest models like Opus or GPT, as this is the most critical stage.

For the bulk 80% of routine work, he suggests using a cheaper open-source model. The final 10% and result verification are again assigned to high-level models. Deutscher claims to have been applying this scheme for several months and considers it the best way to reduce excessive AI spending.

Cryptalist Analysis: Coinbase's experience is a clear textbook for the entire industry. We are moving from the era of "just use the most powerful model" to the era of "use the right model for the right task." Smart routing and aggressive caching are not just about savings; they are the new standard of efficiency. Companies that fail to implement such practices risk simply burning capital on uncontrolled AI consumption.

Crypto news