OpenHands Save complete trajectory in presence of history truncation

End-user friendly description of the problem this fixes or functionality that this introduces

[ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.

Direct cause is that we are not closing the controller properly in headless mode. If we had done so, the history would be recovered properly and thus trajectory would have been complete. This PR fixes this issue.

~~Note that state.history is being used in many places (including but not limited to a few benchmark evaluation harness), which we might also want to evaluate if they actually need truncated history or full history. Maybe we need better names for truncated/full history.~~

~~Also, note that stuck detector also depends on state.history, which could potentially lead to malfunction of stuck detector if the loop appears around the place where we do the truncate.~~

Anyways, this PR fixes the partial trajectory issue and includes a unit test, which could be used as a testbed for any future renaming/refactoring.~~

Kudos to @adityasoni9998 for finding this issue after analyzing a few evaluation trajectories.

Link of any specific issues this addresses

~~This also includes a small fix for #6749 in openhands/server/routes/trajectory.py~~

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e486d02-nikolaik   --name openhands-app-e486d02   docker.all-hands.dev/all-hands-ai/openhands:e486d02

Feb 17 '25 07:02 li-boxuan

#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.

There is something I don't understand about this: https://github.com/All-Hands-AI/OpenHands/blob/3d10df4c7935124b0f7950e4024f4887af6afe1b/openhands/controller/agent_controller.py#L159

close makes sure that history has all events, so that it's as evals expect it to be: full, as far as I can see, with agent-specific events. It's supposed to happen before the controller state is read in evals and other external scripts. Isn't that happening anymore?

Feb 17 '25 11:02 enyst

Cc: @csmith49

To elaborate a bit:

when delegates are used, state.history contains only the history of the active agent
similarly, in the truncation case, state.history contains only events starting from an id
but evals or other external callers of run_controller need full history
afaict, controller.close was supposed to work for both, putting Humpty back together
when run_controller returns the final state, close must have happened
including for exceptions.

TBH this is fragile code, it doesn't feel great to rely on close being called. Still, it's close, it's bit surprising that the controller wouldn't be closed suddenly? I didn't repro the issue. 🤔

The main reason for state.history being the same variable has been that it's so common in agents frameworks/projects/experiments. Everyone seems to expect it, so it didn't seem obvious how else to call things. (When history was a class, there was a history.get_events with flags for all events or current agent.)

Re: stuck I think it's fine: stuck works with the latest 6 events in the halved history = the latest 6 events in the full history, unless this half is less than 6 events. If it is less, which should be rare if ever, it's not really stuck... 🤔 (if the LLM gets less than 6, plus the task, seems like a very edge case, but more importantly, I guess it should be able to get out by itself...)

Feb 17 '25 12:02 enyst

when delegates are used, state.history contains only the history of the active agent

I'm still working through the control-flow of controller.close. Does this mean that if delegate A truncates, then we switch to delegate B for a bit, when we switch back to delegate A we end up with the non-truncated history?

Feb 17 '25 16:02 csmith49

when delegates are used, state.history contains only the history of the active agent

I'm still working through the control-flow of controller.close. Does this mean that if delegate A truncates, then we switch to delegate B for a bit, when we switch back to delegate A we end up with the non-truncated history?

🤔 That's a good question! 😅

...Shouldn't change anything, because the B controller closes, and the B state.history may be there, but A controller doesn't read it. It doesn't seem to care about B's state.history: https://github.com/All-Hands-AI/OpenHands/blob/ae31a24c29373ccda9f098952081a4a439fbcbc1/openhands/controller/agent_controller.py#L610

A's controller has A's agent state instance, which is different, and A controller.close didn't run yet... if that made sense

Feb 17 '25 16:02 enyst

#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.

There is something I don't understand about this:

https://github.com/All-Hands-AI/OpenHands/blob/3d10df4c7935124b0f7950e4024f4887af6afe1b/openhands/controller/agent_controller.py#L159

close makes sure that history has all events, so that it's as evals expect it to be: full, as far as I can see, with agent-specific events. It's supposed to happen before the controller state is read in evals and other external scripts. Isn't that happening anymore?

Ah interesting... so the get_history() call is supposed to work? But it isn't... (there's no delegation involved in my experiment)

FYI my commit was https://github.com/All-Hands-AI/OpenHands/commit/aae611a9d407c553ef0edbede822bb3e509c9928, which is built on top of https://github.com/All-Hands-AI/OpenHands/commit/e487008e741db8f1f86f67814b543dbec6e29a5a

Everything between id 0 and 77 are truncated/lost in the final history.

Feb 17 '25 18:02 li-boxuan

Oh wait I see the problem! controller.close() is not invoked in main.py, so the headless mode is not closing the controller properly

Feb 17 '25 18:02 li-boxuan

TBH this is fragile code, it doesn't feel great to rely on close being called.

Yeah I agree, especially fragile if we want to export/save trajectory before closing the controller, or, say, in the middle of a conversation.

The main reason for state.history being the same variable has been that it's so common in agents frameworks/projects/experiments. Everyone seems to expect it, so it didn't seem obvious how else to call things. (When history was a class, there was a history.get_events with flags for all events or current agent.)

Yeah understood, it's hard to deal with legacy esp without enough test coverage

Feb 17 '25 18:02 li-boxuan

I was just coming back to this PR after I saw nothing calls it - the agent_session does (but then that's UI only). Some recent-ish refactoring must have broken this. Weird though.

Thank you for the investigation!

Feb 17 '25 19:02 enyst