Track Cached Token Usage Separately
Summary
Track cached input token reads and writes separately from non-cached input tokens.
If the provider does not support cached tokens, use None instead of 0.
If the provider has support for cached tokens, fetch that price from open router and use it when displaying the cost in the ui.
Type of Change
- [x] Feature
- [ ] Bug fix
- [ ] Refactor / Code quality
- [ ] Performance improvement
- [ ] Documentation
- [ ] Tests
- [ ] Security fix
- [ ] Build / Release
- [ ] Other (specify below)
Testing
Added tests, manually tested in desktop ui and cli, checking sqlite database for migration and values being set correctly.
Related Issues
Relates to #4988
Discussion:
Screenshots/Demos (for UX changes)
After:
@mikehw @katzdave Do we still want to do this one? A lot has changed since right?
I'll take a crack at the merge conflicts.
Is it still worth getting in though in its current form? I thought we comprehensively reworked token counting and compaction in the time between when this was originally filed and now.
It would be nice to be able to track the API cost of a session accurately. Since OpenAI automatically caches input I don’t think getting an accurate cost is possible with OpenAI without tracking the cached tokens separately. If this PR isn’t the right approach to get there with the refactor that’s okay, feel free to close.
Fixed the conflicts/ci + pushed a small change to include these new tokens in the message stream.
I think generally in good shape, if anyone else wants to take another final look.
Did some more testing. @mikehw would you mind taking another pass at the provider specific code. I'm seeing very low token counts for the current session input tokens (e.g. <100 tokens). The system prompt is around 7k tokens so I'd imagine we want to show to still show that despite that being cached as it still consumes that much of your context window.
In particular was testing with anthropic format.
I think we should just try to split the tokens on every wall between the accumulated and cached_accumulated_tokens since those are the ones that compute cost.
@DOsinga I'm thinking of potentially merging everything except the provider specific code here to get the cached_token mechanism in, and then we can more incrementally tackle those.
Closing for now as some time has passed but I'm going to pick this feature back up.