opencode feat(provider): add provider-specific cache configuration system (significant token usage reduction)

Summary

Implements comprehensive ProviderConfig system for provider-specific caching and prompt optimization.

Closes #5416

Test Results with Claude Opus 4.5 (my primary target for improvement)

A/B testing with identical prompts (same session, same agent) comparing legacy vs optimized behavior:

Metric	Legacy	Optimized	Improvement
Cache writes (post-warmup)	18,417 tokens	~10,340 tokens	44% reduction
Effective cost (3rd prompt)	13,021 tokens	3,495 tokens	73% reduction
Initial cache write	16,211 tokens	17,987 tokens	+11% (expected)
Cache hit rate	100%	100%	Same

The slightly larger initial cache write (+11%) is quickly amortized by dramatically fewer cache invalidations in subsequent requests.

Provider testing

I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).

Changes

Add ProviderConfig namespace with defaults for 19+ providers
Support three caching paradigms:
- Explicit breakpoint (Anthropic, Bedrock, Google Vertex Anthropic)
- Automatic prefix (OpenAI, Azure, GitHub Copilot, DeepSeek)
- Implicit/content-based (Google/Gemini)
Add tool sorting for cache consistency across requests
Add tool caching for explicit breakpoint providers
Add user config overrides via opencode.json (provider and agent level)
Simplify system message handling with combineSystemMessages boolean

Config Priority

Provider defaults → User provider config → User agent config

Area for future optimization

Currently, models.dev doesn't provide information regarding minimum cache requirements or prompt requirements, so this had to be written out as configuration. It would be ideal if the model definitions were updated with this detail. Until that point, as providers/models are added or updated, for optimal performance the configuration should be updated to match.

New Files

src/provider/config.ts (874 lines)
test/provider/config.test.ts (215 tests)

Example Config

{
  "provider": {
    "anthropic": {
      "cache": {
        "enabled": true,
        "ttl": "1h",
        "minTokens": 2048
      }
    }
  },
  "agent": {
    "plan": {
      "cache": {
        "ttl": "1h"
      }
    }
  }
}

Dec 12 '25 06:12 ormandj

Please merge this ❤️

I'm deciding whether we should use opencode in our company and I'm actually surprised this wasn't implemented up until now.

Dec 12 '25 15:12 yamiteru

I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).

Dec 12 '25 16:12 ormandj

Updated one of the comments to be more clear, but no functional changes.

Dec 12 '25 16:12 ormandj

hey thanks for working on this - i'm actually currently mid refactor on a bunch of code where we call LLMs so we probably can't explicitly accept this PR (llm-centralization branch)

also i don't know if we want to go as far as making all this deeply configurable yet

could you lay out all the things you improved on top of what we have currently? then i can make sure those get included

Dec 12 '25 20:12 thdxr

i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged

Dec 12 '25 20:12 thdxr

i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged

Ok, that's fine. I added comments throughout that should help your opencode summarization, but I'm happy to discuss if you need more clarity. I'm on discord if you need more input. I was initially going to refactor all of your provider handling, but it sounds like you're already doing this - I designed this PR to be a stepping stone towards that since I didn't want to make my very first PR be a massive rewrite of lots of things.

If you just want to take my work and fold it into your work, I understand, at the end of the day it's of huge benefit to the users, so I'll be happy. Thank you for taking the time to evaluate things and hope to see your rewrite soon! I can also just update this once you're done/merged in, just let me know what works best.

Dec 12 '25 20:12 ormandj

Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage

Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching

diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
     const total =
       last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
     const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+    // Calculate cache hit percentage
+    const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+    const cacheRead = last.tokens.cache.read
+    const cacheWrite = last.tokens.cache.write
+
     return {
       tokens: total.toLocaleString(),
       percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+      cache: {
+        hitPercentage: cacheHitPercentage,
+        read: cacheRead,
+        write: cacheWrite,
+      },
     }
   })

@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
               </text>
               <text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
               <text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+              <Show when={context()?.cache !== undefined}>
+                <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+                  {context()!.cache.hitPercentage}% cached
+                </text>
+              </Show>
               <text fg={theme.textMuted}>{cost()} spent</text>
             </box>
             <Show when={mcpEntries().length > 0}>
(END)

Maybe the correct way would be to do something like this instead :thinking:

  const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write
  const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0

Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, it should be set on a per-session basis and not update automatically

Dec 15 '25 01:12 gytis-ivaskevicius

Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage

Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching

diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
     const total =
       last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
     const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+    // Calculate cache hit percentage
+    const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+    const cacheRead = last.tokens.cache.read
+    const cacheWrite = last.tokens.cache.write
+
     return {
       tokens: total.toLocaleString(),
       percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+      cache: {
+        hitPercentage: cacheHitPercentage,
+        read: cacheRead,
+        write: cacheWrite,
+      },
     }
   })

@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
               </text>
               <text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
               <text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+              <Show when={context()?.cache !== undefined}>
+                <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+                  {context()!.cache.hitPercentage}% cached
+                </text>
+              </Show>
               <text fg={theme.textMuted}>{cost()} spent</text>
             </box>
             <Show when={mcpEntries().length > 0}>
(END)

Maybe the correct way would be to do something like this instead 🤔

  const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write
  const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0

Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, there should be a toggle to get rid of it

I'd be happy to do this except I was informed by @thdxr they are reworking this application, so I assume they intend to take these ideas and just merge it into whatever they're working on. I'm leaving this open until I hear otherwise, but I'm not sure it makes sense for me to do more implementation at this point with @thdxr 's input. As far as doing cache statistics, I have a number of ideas that would work, but I don't want to invest time into it to have it closed out unmerged.

Dec 15 '25 02:12 ormandj

Given that llm-centralization branch has been merged; is there any hope of bringing the improvements from here onto mainline? @thdxr

Seems the person opening the PR is waiting for feedback; and until there is any- it seems they're going to be left in the dark.

Dec 16 '25 12:12 Sewer56