Calvin Smith

Results 29 comments of Calvin Smith

@jpelletier1 > I'd be curious how condenser may play a role in forking a conversation - would you need to run condenser again? Based on what's proposed here I don't...

This looks clean so far. I'm a fan of moving what we can to the event stream: much better visibility than modifying messages in-place, and I think leaning into the...

If you're trying to run the tests yourself, take a look at the instructions in the README [here](https://github.com/All-Hands-AI/OpenHands/tree/97e938d5450728128ccbf896ecbc5963ac223012/evaluation/benchmarks/swe_bench). You'll have to manually replace all references to `princeton-nlp/SWE-bench_Lite` with `princeton-nlp/SWE-bench_Verified`.

I'm running into this using the development workflow (needed for the benchmarking scripts) in a Ubuntu 20.04 container.

Worth noting the linked issue (https://github.com/All-Hands-AI/OpenHands/issues/6357) is likely resolved with the context management changes from https://github.com/All-Hands-AI/OpenHands/pull/7578 and https://github.com/All-Hands-AI/OpenHands/pull/7353. I expect this looping behavior is a different beast entirely. One key...

> when delegates are used, state.history contains only the history of the active agent I'm still working through the control-flow of `controller.close`. Does this mean that if delegate A truncates,...

> > I would really like to know how this version performs on a benchmark. Unfortunately I can not run those very well myself. > > I can run evals...

Okay, I've got the data in. ## Setup I'm specifically comparing this condenser to the runs reported in the blog post here. That means a few options are standardized, and...

@happyherp I've got the trajectories for all three runs loaded up here: https://github.com/csmith49/oh-trajectories > Yes. Some weird stuff going on after the first condensation that effectively disables caching. But on...

@happyherp Did another quick run with a very small subset (16) just to check the caching behavior. Looks like your fix worked: ![visualization](https://github.com/user-attachments/assets/4dd9f16d-05d7-43f9-9170-3471be779885) This graph is not to be compared...