Calvin Smith
Calvin Smith
@jpelletier1 > I'd be curious how condenser may play a role in forking a conversation - would you need to run condenser again? Based on what's proposed here I don't...
This looks clean so far. I'm a fan of moving what we can to the event stream: much better visibility than modifying messages in-place, and I think leaning into the...
If you're trying to run the tests yourself, take a look at the instructions in the README [here](https://github.com/All-Hands-AI/OpenHands/tree/97e938d5450728128ccbf896ecbc5963ac223012/evaluation/benchmarks/swe_bench). You'll have to manually replace all references to `princeton-nlp/SWE-bench_Lite` with `princeton-nlp/SWE-bench_Verified`.
I'm running into this using the development workflow (needed for the benchmarking scripts) in a Ubuntu 20.04 container.
Worth noting the linked issue (https://github.com/All-Hands-AI/OpenHands/issues/6357) is likely resolved with the context management changes from https://github.com/All-Hands-AI/OpenHands/pull/7578 and https://github.com/All-Hands-AI/OpenHands/pull/7353. I expect this looping behavior is a different beast entirely. One key...
> when delegates are used, state.history contains only the history of the active agent I'm still working through the control-flow of `controller.close`. Does this mean that if delegate A truncates,...
> > I would really like to know how this version performs on a benchmark. Unfortunately I can not run those very well myself. > > I can run evals...
Okay, I've got the data in. ## Setup I'm specifically comparing this condenser to the runs reported in the blog post here. That means a few options are standardized, and...
@happyherp I've got the trajectories for all three runs loaded up here: https://github.com/csmith49/oh-trajectories > Yes. Some weird stuff going on after the first condensation that effectively disables caching. But on...
@happyherp Did another quick run with a very small subset (16) just to check the caching behavior. Looks like your fix worked:  This graph is not to be compared...