RA.Aid icon indicating copy to clipboard operation
RA.Aid copied to clipboard

Integrate LiteLLM Token Counter for Improved Context Window Usage

Open ariel-frischer opened this issue 1 year ago • 0 comments

Integrate LiteLLM Token Counter for Improved Context Window Usage

Overview

Currently, we use a simplified token counting approach (text // 3) which, while fast, may significantly underutilize our context window. LiteLLM provides more accurate token counting that we should integrate since it's already a dependency.

Background

  • Current implementation uses basic byte length heuristic (1 token per 3 bytes)
  • LiteLLM offers model-specific token counting
  • Falls back to tiktoken when model-specific tokenizer unavailable
  • No additional dependencies needed

Acceptance Criteria

  • [ ] Replace current token counting with LiteLLM's token_counter
  • [ ] Implement fallback mechanism for invalid message structures
  • [ ] Verify token counting works with all supported models
  • [ ] Update relevant documentation
  • [ ] Add tests for token counting with different models

Implementation Details

Relevant Files

  • ra_aid/agents/ciayn_agent.py: Contains current token estimation logic
  • ra_aid/agent_utils.py: May need updates for token counting integration

Example Implementation

from litellm import token_counter

messages = [{"user": "role", "content": "Hey, how's it going"}]
print(token_counter(model="gpt-3.5-turbo", messages=messages))

Testing Requirements

  • Test with all commonly used models:
    • GPT models
    • Claude models
    • Deepseek models
    • Other supported providers
  • Verify handling of invalid message structures
  • Compare token counts with current implementation

Error Handling

  • Implement fallback to current estimation method if token_counter fails
  • Log warnings when falling back to estimation
  • Track failed token counting attempts for monitoring

Potential Risks

  • API errors due to invalid message structure/ordering
  • Performance impact of more complex token counting
  • Compatibility issues with certain models

References

Notes

  • Implementation should be thoroughly tested with all supported models
  • Consider adding configuration option to fall back to estimated counting
  • Monitor performance impact of new token counting implementation
  • Should test with long running sessions to make sure token limiter is working as expected with Sonnet 3.5 and does NOT hit the maximum context window API error (which breaks the session).

ariel-frischer avatar Jan 31 '25 20:01 ariel-frischer