Integrate LiteLLM Token Counter for Improved Context Window Usage

Open ariel-frischer opened this issue 1 year ago • 0 comments

Integrate LiteLLM Token Counter for Improved Context Window Usage

Overview

Currently, we use a simplified token counting approach (text // 3) which, while fast, may significantly underutilize our context window. LiteLLM provides more accurate token counting that we should integrate since it's already a dependency.

Background

Current implementation uses basic byte length heuristic (1 token per 3 bytes)
LiteLLM offers model-specific token counting
Falls back to tiktoken when model-specific tokenizer unavailable
No additional dependencies needed

Acceptance Criteria

[ ] Replace current token counting with LiteLLM's token_counter
[ ] Implement fallback mechanism for invalid message structures
[ ] Verify token counting works with all supported models
[ ] Update relevant documentation
[ ] Add tests for token counting with different models

Implementation Details

Relevant Files

ra_aid/agents/ciayn_agent.py: Contains current token estimation logic
ra_aid/agent_utils.py: May need updates for token counting integration

Example Implementation

from litellm import token_counter

messages = [{"user": "role", "content": "Hey, how's it going"}]
print(token_counter(model="gpt-3.5-turbo", messages=messages))

Testing Requirements

Test with all commonly used models:
- GPT models
- Claude models
- Deepseek models
- Other supported providers
Verify handling of invalid message structures
Compare token counts with current implementation

Error Handling

Implement fallback to current estimation method if token_counter fails
Log warnings when falling back to estimation
Track failed token counting attempts for monitoring

Potential Risks

API errors due to invalid message structure/ordering
Performance impact of more complex token counting
Compatibility issues with certain models

References

LiteLLM Token Usage Documentation
Current token counting implementation in CiaynAgent class

Notes

Implementation should be thoroughly tested with all supported models
Consider adding configuration option to fall back to estimated counting
Monitor performance impact of new token counting implementation
Should test with long running sessions to make sure token limiter is working as expected with Sonnet 3.5 and does NOT hit the maximum context window API error (which breaks the session).

Jan 31 '25 20:01 ariel-frischer