feat(provider): add provider-specific cache configuration system (significant token usage reduction)
Summary
Implements comprehensive ProviderConfig system for provider-specific caching and prompt optimization.
Closes #5416
Test Results with Claude Opus 4.5 (my primary target for improvement)
A/B testing with identical prompts (same session, same agent) comparing legacy vs optimized behavior:
| Metric | Legacy | Optimized | Improvement |
|---|---|---|---|
| Cache writes (post-warmup) | 18,417 tokens | ~10,340 tokens | 44% reduction |
| Effective cost (3rd prompt) | 13,021 tokens | 3,495 tokens | 73% reduction |
| Initial cache write | 16,211 tokens | 17,987 tokens | +11% (expected) |
| Cache hit rate | 100% | 100% | Same |
The slightly larger initial cache write (+11%) is quickly amortized by dramatically fewer cache invalidations in subsequent requests.
Provider testing
I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).
Changes
- Add
ProviderConfignamespace with defaults for 19+ providers - Support three caching paradigms:
- Explicit breakpoint (Anthropic, Bedrock, Google Vertex Anthropic)
- Automatic prefix (OpenAI, Azure, GitHub Copilot, DeepSeek)
- Implicit/content-based (Google/Gemini)
- Add tool sorting for cache consistency across requests
- Add tool caching for explicit breakpoint providers
- Add user config overrides via
opencode.json(provider and agent level) - Simplify system message handling with
combineSystemMessagesboolean
Config Priority
Provider defaults → User provider config → User agent config
Area for future optimization
Currently, models.dev doesn't provide information regarding minimum cache requirements or prompt requirements, so this had to be written out as configuration. It would be ideal if the model definitions were updated with this detail. Until that point, as providers/models are added or updated, for optimal performance the configuration should be updated to match.
New Files
-
src/provider/config.ts(874 lines) -
test/provider/config.test.ts(215 tests)
Example Config
{
"provider": {
"anthropic": {
"cache": {
"enabled": true,
"ttl": "1h",
"minTokens": 2048
}
}
},
"agent": {
"plan": {
"cache": {
"ttl": "1h"
}
}
}
}
Please merge this ❤️
I'm deciding whether we should use opencode in our company and I'm actually surprised this wasn't implemented up until now.
I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).
Updated one of the comments to be more clear, but no functional changes.
hey thanks for working on this - i'm actually currently mid refactor on a bunch of code where we call LLMs so we probably can't explicitly accept this PR (llm-centralization branch)
also i don't know if we want to go as far as making all this deeply configurable yet
could you lay out all the things you improved on top of what we have currently? then i can make sure those get included
i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged
i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged
Ok, that's fine. I added comments throughout that should help your opencode summarization, but I'm happy to discuss if you need more clarity. I'm on discord if you need more input. I was initially going to refactor all of your provider handling, but it sounds like you're already doing this - I designed this PR to be a stepping stone towards that since I didn't want to make my very first PR be a massive rewrite of lots of things.
If you just want to take my work and fold it into your work, I understand, at the end of the day it's of huge benefit to the users, so I'll be happy. Thank you for taking the time to evaluate things and hope to see your rewrite soon! I can also just update this once you're done/merged in, just let me know what works best.
Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage
Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching
diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
const total =
last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+ // Calculate cache hit percentage
+ const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+ const cacheRead = last.tokens.cache.read
+ const cacheWrite = last.tokens.cache.write
+
return {
tokens: total.toLocaleString(),
percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+ cache: {
+ hitPercentage: cacheHitPercentage,
+ read: cacheRead,
+ write: cacheWrite,
+ },
}
})
@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
</text>
<text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
<text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+ <Show when={context()?.cache !== undefined}>
+ <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+ {context()!.cache.hitPercentage}% cached
+ </text>
+ </Show>
<text fg={theme.textMuted}>{cost()} spent</text>
</box>
<Show when={mcpEntries().length > 0}>
(END)
Maybe the correct way would be to do something like this instead :thinking:
const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write
const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0
Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, it should be set on a per-session basis and not update automatically
Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage
Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching
diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx index c1c29a73..afaaad73 100644 --- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx +++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx @@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) { const total = last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID] + + // Calculate cache hit percentage + const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0 + const cacheRead = last.tokens.cache.read + const cacheWrite = last.tokens.cache.write + return { tokens: total.toLocaleString(), percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null, + cache: { + hitPercentage: cacheHitPercentage, + read: cacheRead, + write: cacheWrite, + }, } }) @@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) { </text> <text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text> <text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text> + <Show when={context()?.cache !== undefined}> + <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}> + {context()!.cache.hitPercentage}% cached + </text> + </Show> <text fg={theme.textMuted}>{cost()} spent</text> </box> <Show when={mcpEntries().length > 0}> (END)Maybe the correct way would be to do something like this instead 🤔
const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, there should be a toggle to get rid of it
I'd be happy to do this except I was informed by @thdxr they are reworking this application, so I assume they intend to take these ideas and just merge it into whatever they're working on. I'm leaving this open until I hear otherwise, but I'm not sure it makes sense for me to do more implementation at this point with @thdxr 's input. As far as doing cache statistics, I have a number of ideas that would work, but I don't want to invest time into it to have it closed out unmerged.
Given that llm-centralization branch has been merged; is there any hope of bringing the improvements from here onto mainline? @thdxr
Seems the person opening the PR is waiting for feedback; and until there is any- it seems they're going to be left in the dark.