Save complete trajectory in presence of history truncation
End-user friendly description of the problem this fixes or functionality that this introduces
- [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
Give a summary of what the PR does, explaining any non-trivial design decisions
#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.
Direct cause is that we are not closing the controller properly in headless mode. If we had done so, the history would be recovered properly and thus trajectory would have been complete. This PR fixes this issue.
~~Note that state.history is being used in many places (including but not limited to a few benchmark evaluation harness), which we might also want to evaluate if they actually need truncated history or full history. Maybe we need better names for truncated/full history.~~
~~Also, note that stuck detector also depends on state.history, which could potentially lead to malfunction of stuck detector if the loop appears around the place where we do the truncate.~~
Anyways, this PR fixes the partial trajectory issue and includes a unit test, which could be used as a testbed for any future renaming/refactoring.~~
Kudos to @adityasoni9998 for finding this issue after analyzing a few evaluation trajectories.
Link of any specific issues this addresses
~~This also includes a small fix for #6749 in openhands/server/routes/trajectory.py~~
To run this PR locally, use the following command:
docker run -it --rm -p 3000:3000 -v /var/run/docker.sock:/var/run/docker.sock --add-host host.docker.internal:host-gateway -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e486d02-nikolaik --name openhands-app-e486d02 docker.all-hands.dev/all-hands-ai/openhands:e486d02
#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.
There is something I don't understand about this: https://github.com/All-Hands-AI/OpenHands/blob/3d10df4c7935124b0f7950e4024f4887af6afe1b/openhands/controller/agent_controller.py#L159
close makes sure that history has all events, so that it's as evals expect it to be: full, as far as I can see, with agent-specific events. It's supposed to happen before the controller state is read in evals and other external scripts. Isn't that happening anymore?
Cc: @csmith49
To elaborate a bit:
- when delegates are used,
state.historycontains only the history of the active agent - similarly, in the truncation case,
state.historycontains only events starting from anid - but evals or other external callers of
run_controllerneed full history - afaict,
controller.closewas supposed to work for both, putting Humpty back together - when
run_controllerreturns the final state,closemust have happened - including for exceptions.
TBH this is fragile code, it doesn't feel great to rely on close being called. Still, it's close, it's bit surprising that the controller wouldn't be closed suddenly? I didn't repro the issue. 🤔
The main reason for state.history being the same variable has been that it's so common in agents frameworks/projects/experiments. Everyone seems to expect it, so it didn't seem obvious how else to call things. (When history was a class, there was a history.get_events with flags for all events or current agent.)
Re: stuck I think it's fine: stuck works with the latest 6 events in the halved history = the latest 6 events in the full history, unless this half is less than 6 events. If it is less, which should be rare if ever, it's not really stuck... 🤔 (if the LLM gets less than 6, plus the task, seems like a very edge case, but more importantly, I guess it should be able to get out by itself...)
when delegates are used, state.history contains only the history of the active agent
I'm still working through the control-flow of controller.close. Does this mean that if delegate A truncates, then we switch to delegate B for a bit, when we switch back to delegate A we end up with the non-truncated history?
when delegates are used, state.history contains only the history of the active agent
I'm still working through the control-flow of
controller.close. Does this mean that if delegate A truncates, then we switch to delegate B for a bit, when we switch back to delegate A we end up with the non-truncated history?
🤔 That's a good question! 😅
...Shouldn't change anything, because the B controller closes, and the B state.history may be there, but A controller doesn't read it. It doesn't seem to care about B's state.history:
https://github.com/All-Hands-AI/OpenHands/blob/ae31a24c29373ccda9f098952081a4a439fbcbc1/openhands/controller/agent_controller.py#L610
A's controller has A's agent state instance, which is different, and A controller.close didn't run yet... if that made sense
#4977 introduced truncation feature to handle long context errors by cutting the history. Since trajectory saving feature depends on "controller.state.history", this causes saved trajectories to only contain partial history.
There is something I don't understand about this:
https://github.com/All-Hands-AI/OpenHands/blob/3d10df4c7935124b0f7950e4024f4887af6afe1b/openhands/controller/agent_controller.py#L159
closemakes sure thathistoryhas all events, so that it's as evals expect it to be: full, as far as I can see, with agent-specific events. It's supposed to happen before the controller state is read in evals and other external scripts. Isn't that happening anymore?
Ah interesting... so the get_history() call is supposed to work? But it isn't... (there's no delegation involved in my experiment)
FYI my commit was https://github.com/All-Hands-AI/OpenHands/commit/aae611a9d407c553ef0edbede822bb3e509c9928, which is built on top of https://github.com/All-Hands-AI/OpenHands/commit/e487008e741db8f1f86f67814b543dbec6e29a5a
Everything between id 0 and 77 are truncated/lost in the final history.
Oh wait I see the problem! controller.close() is not invoked in main.py, so the headless mode is not closing the controller properly
TBH this is fragile code, it doesn't feel great to rely on close being called.
Yeah I agree, especially fragile if we want to export/save trajectory before closing the controller, or, say, in the middle of a conversation.
The main reason for state.history being the same variable has been that it's so common in agents frameworks/projects/experiments. Everyone seems to expect it, so it didn't seem obvious how else to call things. (When history was a class, there was a history.get_events with flags for all events or current agent.)
Yeah understood, it's hard to deal with legacy esp without enough test coverage
I was just coming back to this PR after I saw nothing calls it - the agent_session does (but then that's UI only). Some recent-ish refactoring must have broken this. Weird though.
Thank you for the investigation!