OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[OH-Versa] Add remaining browsing & GAIA eval improvement

Open ryanhoangt opened this issue 7 months ago • 6 comments

  • [ ] This change is worth documenting at https://docs.all-hands.dev/
  • [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality this introduces.


Summarize what the PR does, explaining any non-trivial design decisions.

Ported from https://github.com/All-Hands-AI/OpenHands/pull/8598

  • Add file download observation via browsing
  • Some minor improvements to multimodal and text-based browsing
  • GAIA eval changes

Eval result:

  • GAIA - claude-3-7-sonnet:
    • main: 12/30
    • This PR: 15/30 (50%)

Link of any specific issues this addresses:


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:a7224c7-nikolaik   --name openhands-app-a7224c7   docker.all-hands.dev/all-hands-ai/openhands:a7224c7

ryanhoangt avatar Jun 09 '25 17:06 ryanhoangt

Thanks @ryanhoangt , has this ben evaluated at all?

neubig avatar Jun 09 '25 18:06 neubig

@OpenHands

Please add a test for download action in tests/runtime/test_browsing.py.

li-boxuan avatar Jun 10 '25 03:06 li-boxuan

I'm on it! li-boxuan can track my progress at all-hands.dev

openhands-ai[bot] avatar Jun 10 '25 03:06 openhands-ai[bot]

I've successfully added a test for the download action in the browser functionality to the tests/runtime/test_browsing.py file. Here's a summary of what I did:

  1. First, I explored the codebase to understand how the browser functionality works, particularly focusing on:

    • The browser environment setup in openhands/runtime/browser/browser_env.py
    • The browser actions in openhands/events/action/browse.py
    • The browser observations in openhands/events/observation/browse.py
  2. I created a new test function called test_download_file that:

    • Creates a test file to be downloaded
    • Creates an HTML page with a download link
    • Starts a simple HTTP server
    • Uses the browser to navigate to the HTML page
    • Clicks on the download link
    • Verifies that the file was downloaded to the /workspace/downloads/ directory
    • Checks that the downloaded file has the correct content
    • Cleans up after the test
  3. I ran the pre-commit hooks to ensure code quality and formatting standards were met.

  4. I committed the changes and pushed them to the ht/oh-versa-eval branch, successfully updating PR #9015.

The test verifies that the browser environment is correctly configured to handle downloads, which is an important feature for agents that need to download files from the web.

openhands-ai[bot] avatar Jun 10 '25 03:06 openhands-ai[bot]

@ryanhoangt I think I may let @adityasoni9998 and @li-boxuan review this one, But I'm very happy to have it in.

neubig avatar Jun 14 '25 12:06 neubig

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Docker

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #9015

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

openhands-ai[bot] avatar Jun 25 '25 04:06 openhands-ai[bot]