Trajectory replay on web GUI
End-user friendly description of the problem this fixes or functionality that this introduces
- [x] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
Give a summary of what the PR does, explaining any non-trivial design decisions
Support trajectory replay on web GUI.
Link of any specific issues this addresses
#6049
To run this PR locally, use the following command:
docker run -it --rm -p 3000:3000 -v /var/run/docker.sock:/var/run/docker.sock --add-host host.docker.internal:host-gateway -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:44d5fd7-nikolaik --name openhands-app-44d5fd7 docker.all-hands.dev/all-hands-ai/openhands:44d5fd7
Since this is adding new UI elements, I think we'd want a quick look from the designer. CC @amanape so he can coordinate that if possible.
I agree with @mamoodi , and also given that this is advanced functionality that most users will not be using, maybe we can put it somewhere else, for example create an "advanced functionality" button in the sidebar, that starts out by just including this but eventually will house other options that aren't within the main usage path
Yeah I agree it's "advanced" functionality that we should somehow hide from normal users.
My primary intention is to be able to replay trajectories from benchmarks for 1) debugging, and 2) demo.
I may be an odd bird, I find exciting a use case like:
- run on the local runtime
- until a prompt confirmation warning says you shouldn't do it (unexpected installation, red zone,
rm -rf /😅) [1] - then press a button to transfer to the remote runtime
- replay and go
[1] Of course, the warning comes with a voice saying "I'm sorry, Boxuan, I'm afraid I can't do that."
OK, probably not a suitable use case 😅 (what about increasing the runtime resources, does it just work?)
I may be an odd bird, I find exciting a use case like:
- run on the local runtime
- until a prompt confirmation warning says you shouldn't do it (unexpected installation, red zone,
rm -rf /😅) [1]- then press a button to transfer to the remote runtime
- replay and go
[1] Of course, the warning comes with a voice saying "I'm sorry, Boxuan, I'm afraid I can't do that."
OK, probably not a suitable use case 😅 (what about increasing the runtime resources, does it just work?)
Haha that sounds fun. It’s indeed a potentially very interesting scenario - OpenHands can fail die to hardware resource limits, runtime crash, api token limit… or like you said , halt due to security concern. “transfer to remote runtime and replay” sounds very attractive
@rbren were any actions taken on for this?
@li-boxuan can we revert the FE changes so we can just get the backend change in?
I'm also curious if https://trajectory-visualizer.all-hands.dev/ should take the place of this
@li-boxuan can we revert the FE changes so we can just get the backend change in?
I'm also curious if https://trajectory-visualizer.all-hands.dev/ should take the place of this
Sure I'll revert the UI changes and keep the functionality.
I'm also curious if https://trajectory-visualizer.all-hands.dev/ should take the place of this
Trajectory replay has two usage:
- Visualize what has happened for a session, step by step
- Reproduce and optionally, continue - e.g. I may want to test a new micro-agent for a given step, but I don't want to start over my experiment (which is costly and non-deterministic/non-reproducible).
The trajectory visualizer replaces the 1st usage.
@rbren @amanape I have removed the UX part. The functionality exists but not accessible to users.
Thanks! If we want to have some esoteric key combination trigger the UI for it that's fine too. Or maybe a feature flag https://github.com/All-Hands-AI/OpenHands/blob/d9926d2491384421dc38d3e68720bfb1b486db2e/frontend/src/utils/feature-flags.ts#L4
Edit: feature flag would actually be a great way to enable this for you and other researchers
Looks like something has changed recently and broke the replay feature on web app, but not headless mode... will look into it.
@amanape would you like to review again? This feature is now disabled by default