RA.Aid
RA.Aid copied to clipboard
Integrate LiteLLM Token Counter for Improved Context Window Usage
Integrate LiteLLM Token Counter for Improved Context Window Usage
Overview
Currently, we use a simplified token counting approach (text // 3) which, while fast, may significantly underutilize our context window. LiteLLM provides more accurate token counting that we should integrate since it's already a dependency.
Background
- Current implementation uses basic byte length heuristic (1 token per 3 bytes)
- LiteLLM offers model-specific token counting
- Falls back to tiktoken when model-specific tokenizer unavailable
- No additional dependencies needed
Acceptance Criteria
- [ ] Replace current token counting with LiteLLM's token_counter
- [ ] Implement fallback mechanism for invalid message structures
- [ ] Verify token counting works with all supported models
- [ ] Update relevant documentation
- [ ] Add tests for token counting with different models
Implementation Details
Relevant Files
-
ra_aid/agents/ciayn_agent.py: Contains current token estimation logic -
ra_aid/agent_utils.py: May need updates for token counting integration
Example Implementation
from litellm import token_counter
messages = [{"user": "role", "content": "Hey, how's it going"}]
print(token_counter(model="gpt-3.5-turbo", messages=messages))
Testing Requirements
- Test with all commonly used models:
- GPT models
- Claude models
- Deepseek models
- Other supported providers
- Verify handling of invalid message structures
- Compare token counts with current implementation
Error Handling
- Implement fallback to current estimation method if token_counter fails
- Log warnings when falling back to estimation
- Track failed token counting attempts for monitoring
Potential Risks
- API errors due to invalid message structure/ordering
- Performance impact of more complex token counting
- Compatibility issues with certain models
References
- LiteLLM Token Usage Documentation
- Current token counting implementation in CiaynAgent class
Notes
- Implementation should be thoroughly tested with all supported models
- Consider adding configuration option to fall back to estimated counting
- Monitor performance impact of new token counting implementation
- Should test with long running sessions to make sure token limiter is working as expected with Sonnet 3.5 and does NOT hit the maximum context window API error (which breaks the session).