[OH-Versa] Add remaining browsing & GAIA eval improvement
- [ ] This change is worth documenting at https://docs.all-hands.dev/
- [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
End-user friendly description of the problem this fixes or functionality this introduces.
Summarize what the PR does, explaining any non-trivial design decisions.
Ported from https://github.com/All-Hands-AI/OpenHands/pull/8598
- Add file download observation via browsing
- Some minor improvements to multimodal and text-based browsing
- GAIA eval changes
Eval result:
- GAIA -
claude-3-7-sonnet:-
main: 12/30 - This PR: 15/30 (50%)
-
Link of any specific issues this addresses:
To run this PR locally, use the following command:
docker run -it --rm -p 3000:3000 -v /var/run/docker.sock:/var/run/docker.sock --add-host host.docker.internal:host-gateway -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:a7224c7-nikolaik --name openhands-app-a7224c7 docker.all-hands.dev/all-hands-ai/openhands:a7224c7
Thanks @ryanhoangt , has this ben evaluated at all?
@OpenHands
Please add a test for download action in tests/runtime/test_browsing.py.
I'm on it! li-boxuan can track my progress at all-hands.dev
I've successfully added a test for the download action in the browser functionality to the tests/runtime/test_browsing.py file. Here's a summary of what I did:
-
First, I explored the codebase to understand how the browser functionality works, particularly focusing on:
- The browser environment setup in
openhands/runtime/browser/browser_env.py - The browser actions in
openhands/events/action/browse.py - The browser observations in
openhands/events/observation/browse.py
- The browser environment setup in
-
I created a new test function called
test_download_filethat:- Creates a test file to be downloaded
- Creates an HTML page with a download link
- Starts a simple HTTP server
- Uses the browser to navigate to the HTML page
- Clicks on the download link
- Verifies that the file was downloaded to the
/workspace/downloads/directory - Checks that the downloaded file has the correct content
- Cleans up after the test
-
I ran the pre-commit hooks to ensure code quality and formatting standards were met.
-
I committed the changes and pushed them to the
ht/oh-versa-evalbranch, successfully updating PR #9015.
The test verifies that the browser environment is correctly configured to handle downloads, which is an important feature for agents that need to download files from the web.
@ryanhoangt I think I may let @adityasoni9998 and @li-boxuan review this one, But I'm very happy to have it in.
Looks like there are a few issues preventing this PR from being merged!
- GitHub Actions are failing:
- Docker
If you'd like me to help, just leave a comment, like
@OpenHands please fix the failing actions on PR #9015
Feel free to include any additional details that might help me get this PR into a better state.
You can manage your notification settings