perf: optimize Anthropic prompt caching for reduced token costs

Open ormandj opened this issue 1 month ago • 0 comments

Summary

Optimizes Anthropic prompt caching to significantly reduce token costs for Claude models through two key improvements:

Tool caching: Adds cache breakpoint after tool definitions
System prompt reordering: Places stable, shared content first to maximize cache prefix hits across agents

Why

Anthropic's prompt caching uses prefix matching - the longer the matching prefix, the more tokens are served from cache at 90% discount. The previous implementation had two inefficiencies:

Tools weren't cached: Tool definitions are stable but were re-sent uncached on every request
Agent-specific content came first: This broke cache sharing when switching between agents (e.g., code → explore → code)

Test Results

A/B testing with identical prompts (same session, same agent) comparing legacy vs optimized behavior:

Metric	Legacy	Optimized	Improvement
Cache writes (post-warmup)	18,417 tokens	~10,340 tokens	44% reduction
Effective cost (3rd prompt)	13,021 units	3,495 units	73% reduction
Initial cache write	16,211 tokens	17,987 tokens	+11% (expected)
Cache hit rate	100%	100%	Same

The slightly larger initial cache write (+11%) is quickly amortized by dramatically fewer cache invalidations in subsequent requests.

Changes

New ClaudeCache module with centralized cache control logic and configuration
Tool caching via ProviderTransform.applyToolCaching()
System prompt reordering: header → custom → provider → agent → environment (stable content first)
Configurable via provider.options.cache and agent.cache in opencode.json
Environment flags for testing: OPENCODE_LEGACY_CACHE=true reverts to old behavior, OPENCODE_DISABLE_CACHE=true disables all caching
46 unit tests covering cache control application, TTL configuration, and provider compatibility

Configuration (optional)

{
  "provider": {
    "anthropic": {
      "options": {
        "cache": {
          "enabled": true,
          "toolsTtl": "5m",
          "instructionsTtl": "5m"
        }
      }
    }
  }
}

Agents can override provider settings via agent.cache.

Dec 11 '25 23:12 ormandj