Token-based condensation triggers
The current memory condensers use the size of the history (i.e., the number of Event objects in the history) as a proxy for how much information is contained in the history. But the downstream token costs vary wildly from event to event.
These graphs show the size of events generated when solving SWE-bench Verified instances -- the larger the circle, the more tokens in the event:
The distribution changes based on condensation used, problem instance, OpenHands version and prompt, etc. This leads to difficult-to-debug situations where, even with condensation, we still exceed model max input token limitations.
We should extend condensers to support truncation based on the number of tokens per-event.
This requires we 1) convert events to messages, then 2) approximate the number of tokens needed to encode each message (using custom tokenizers).
As an added bonus, the context truncation that happens in the agent controller when a ContextWindowExceededError can be unified with the custom condenser implementations:
- Configurable context truncation strategies (summarizing, forgetting, masking, etc.)
- Configurable max token limits
- Unlike the agent controller's context truncation, custom condensers do not modify the state so we maintain a complete history