Coinbase has halved its AI spending: how engineering optimization is beating the rise in token consumption

29.06.2026

06:38

Coinbase CEO Brian Armstrong shared an important case study: the company managed to nearly halve its artificial intelligence costs despite exponential growth in token consumption. The key takeaway is that efficiency is achieved not through strict limits and spending notifications, but through proper configuration of default settings, request routing, and caching.

Armstrong emphasized that engineers can choose any model, but the default settings are crucial. At Coinbase, they are experimenting with using cheaper open-weight models such as GLM 5.2 and Kimi 2.7 through an internal gateway. Notably, 91% of employees never hit the limits, so the company moved to cheaper configurations rather than reducing limits.

Routing, Cache, and Context Savings

Coinbase's internal system pre-processes requests, directing them to the most suitable model based on cache hits and cost. For example, an advanced model is needed for planning but is excessive for execution. Ultimately, the model selection should be automated by AI itself, not by a human.

Armstrong specifically highlighted the role of caching. Cache misses are the easiest way to drive up costs, so all requests at Coinbase are configured to reuse already processed information. In the LibreChat service, the share of such hits increased from 5% to 60% after proper configuration.

Context savings are also important. Armstrong advises starting new sessions when switching tasks, narrowly limiting file context, and disabling unused tools. The goal is not to spend fewer tokens, but to waste fewer of them. It is this approach that allowed Coinbase to nearly halve costs while consumption continues to grow.

Deutscher's "Barbell" Strategy

Analyst Miles Deutscher described a similar approach, calling it "token engineering." He proposed a "barbell" strategy to reduce AI costs by 50% or more. The first 10% of work and project planning should be entrusted to the smartest models like Opus or GPT. The main 80% of routine work should be performed by a cheaper open-source model. He recommends again assigning the final 10% and result verification to high-level models. Deutscher has been applying this scheme for several months and considers it the best way to reduce excessive AI spending.

Expert Commentary: The Coinbase case demonstrates a mature approach to managing AI infrastructure. Instead of panicking and restricting access, the company implemented intelligent routing and caching, which is a sound engineering solution. For the crypto industry, where every cent counts, such pragmatism is not just about savings, but a guarantee of sustainable scaling in the face of growing competition.

Crypto news