goose icon indicating copy to clipboard operation
goose copied to clipboard

Track Cached Token Usage Separately

Open mikehw opened this issue 2 months ago • 3 comments

Summary

Track cached input token reads and writes separately from non-cached input tokens.

If the provider does not support cached tokens, use None instead of 0.

If the provider has support for cached tokens, fetch that price from open router and use it when displaying the cost in the ui.

Type of Change

  • [x] Feature
  • [ ] Bug fix
  • [ ] Refactor / Code quality
  • [ ] Performance improvement
  • [ ] Documentation
  • [ ] Tests
  • [ ] Security fix
  • [ ] Build / Release
  • [ ] Other (specify below)

Testing

Added tests, manually tested in desktop ui and cli, checking sqlite database for migration and values being set correctly.

Related Issues

Relates to #4988
Discussion:

Screenshots/Demos (for UX changes)

After:
Screenshot 2025-10-03 at 5 37 16 PM

mikehw avatar Oct 04 '25 17:10 mikehw

@mikehw @katzdave Do we still want to do this one? A lot has changed since right?

alexhancock avatar Nov 06 '25 21:11 alexhancock

I'll take a crack at the merge conflicts.

katzdave avatar Nov 06 '25 23:11 katzdave

Is it still worth getting in though in its current form? I thought we comprehensively reworked token counting and compaction in the time between when this was originally filed and now.

alexhancock avatar Nov 07 '25 01:11 alexhancock

It would be nice to be able to track the API cost of a session accurately. Since OpenAI automatically caches input I don’t think getting an accurate cost is possible with OpenAI without tracking the cached tokens separately. If this PR isn’t the right approach to get there with the refactor that’s okay, feel free to close.

mikehw avatar Nov 07 '25 16:11 mikehw

Fixed the conflicts/ci + pushed a small change to include these new tokens in the message stream.

I think generally in good shape, if anyone else wants to take another final look.

katzdave avatar Nov 07 '25 21:11 katzdave

Did some more testing. @mikehw would you mind taking another pass at the provider specific code. I'm seeing very low token counts for the current session input tokens (e.g. <100 tokens). The system prompt is around 7k tokens so I'd imagine we want to show to still show that despite that being cached as it still consumes that much of your context window.

In particular was testing with anthropic format.

I think we should just try to split the tokens on every wall between the accumulated and cached_accumulated_tokens since those are the ones that compute cost.

katzdave avatar Nov 07 '25 21:11 katzdave

@DOsinga I'm thinking of potentially merging everything except the provider specific code here to get the cached_token mechanism in, and then we can more incrementally tackle those.

katzdave avatar Nov 18 '25 18:11 katzdave

Closing for now as some time has passed but I'm going to pick this feature back up.

katzdave avatar Nov 20 '25 18:11 katzdave