[FEATURE]: Anthropic (and others) caching improvement

Open ormandj opened this issue 1 month ago • 1 comments

Feature hasn't been suggested before.

[x] I have verified this feature I'm about to request hasn't been suggested before.

Describe the enhancement you want to request

I've recently started utilizing opencode, and saw my token usage was somewhat higher with the same general workflow in opencode that I was using in claude code. After doing a little research, I determined the cache model and prompt structure being used was suboptimal for claude-based models.

I've submitted a PR that attempts to address this, and allows configuration at the provider and per-agent level. Some of my workstreams may run for long periods of time, and have gaps inbetween runs for certain types of agents (for example, review agents may not run frequently, but when they do have a large amount of static context used for their instructions), so allowing TTL to be overridden at the agent level made sense to me.

I did some basic testing with the patch, and it made a significant difference in non-cached vs. cached usage, which with claude pricing, can make a huge difference in the cost of using these LLMs. Unfortunately, the minimum cache size wasn't available programmatically, so I had to build a lookup table for the various models. Basic performance/cache testing is in the PR.

I tried to create the PR in a way that wouldn't negatively impact any other models/providers, but could also be used as a starting point for other models/providers that had specific cache implementation requirements. Later this model can be extended to configuration beyond caching.

Dec 12 '25 02:12 ormandj

This issue might be a duplicate of existing issues. Please check:

#4317: Feature request for generic /compact command, auto-compaction, and fork-aware conversations (addresses context management and compaction at a broader level)

The proposed solution in #5416 (TTL-based cache configuration for Anthropic) complements the broader compaction strategy discussed in #4317. Both issues aim to optimize token usage, but from different angles - #5416 focuses on cache TTL specifics for Anthropic models, while #4317 addresses core compaction infrastructure.

Feel free to ignore if this is specifically an Anthropic caching implementation that differs from the broader compaction approach.

Dec 12 '25 02:12 github-actions[bot]