visualwebarena icon indicating copy to clipboard operation
visualwebarena copied to clipboard

VisualWebArena is a benchmark for multimodal agents.

Results 20 visualwebarena issues
Sort by recently updated
recently updated
newest added

Hi team, Thanks for releasing this interesting work. I have a question about the uni test file (test_action_functionalities.py) Ideally, it should parse some text like:"textbox' Full name'" ![Screenshot 2024-07-18 at...

I am trying to show the trace of one of the trace files, 463.trace.zip. This is the command I am using: ``` unzip 463.trace.zip -d 463_trace xvfb-run playwright show-trace 463_trace...

Thanks for open source such great work. May I know how to use apptainer instead of docker for starting up website?

**Reason for Change**: Some answers in test_webarena.raw.json are incorrect. I believe minor fixes are needed for more accurate evaluation. **Changes Made**: I mainly fixed two types of configuration: 1. I...

``` FAILED tests/test_browser_env/test_actions.py::test_is_equivalent - ValueError: Unknown action type: ACTION_TYPES.CLEAR FAILED tests/test_browser_env/test_actions.py::test_action2create_function - NameError: name 'create_clear_action' is not defined ``` @ljang0

## Documentation Update README files for compatibility with both WebArena (WA) and VisualWebArena (VWA). ## WebArena 2.0 WebArena 2.0 addresses annotation issues [reported by various users](https://github.com/web-arena-x/webarena/labels/annotation%20issue). Specifically: - WebArena 2.0...

![Screenshot from 2024-09-21 19-49-49](https://github.com/user-attachments/assets/cdd74afa-2d67-4633-9235-a10d3c20e69c)

Hey, The [docker setup](https://github.com/web-arena-x/visualwebarena/tree/main/environment_docker) is cumbersome, I would love to help you simplify it if we could get access to the Dockerfile and compose.yaml files. That would also allow us...

Hello, do you have any advise on how to set up multiple dockers for the same website. For example, we can set up 10 shoping weisite with different port. So...