webarena icon indicating copy to clipboard operation
webarena copied to clipboard

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Results 28 webarena issues
Sort by recently updated
recently updated
newest added

I would like to re-open issue #104 There's an overuse of exact matches in the eval harness. For example, consider task 649: ``` "intent": "Post in history subreddit about what...

annotation issue

I am not super sure but I noticed something weird with the reddit website in the benchmark: I was looking at task 29: "Tell me the count of comments that...

Dear Authors, I deployed the Gitlab website locally, but I found that it is unstable, e.g., if I keep visiting it, sometimes it works normally, while sometimes it reports 502....

Dear authors, Thanks for your nice work and quick follow-ups for issues. I found two problems when using the benchmark, especially on the map website: 1. The pop-up windows, e.g.,...

There's a typo in task template 19 (noticed for task 649). The template is: "Post in {{subreddit}} subreddit about what could diffusion model help the **correpong** field" It should be...

annotation issue

The gold URL for task #49 contains a bug. ``` { "sites": [ "gitlab" ], "task_id": 45, "require_login": true, "storage_state": "./.auth/gitlab_state.json", "start_url": "__GITLAB__/a11yproject/a11yproject.com", "geolocation": null, "intent_template": "Check out the most...

annotation issue

Hi, there. I am working on some map tasks and have found it hard to accomplish them even by myself. The usage of the OpenStreetMap is quite different from the...

I am wondering why the docker tar files are so large. I am trying to reproduce the environment locally to avoid paying the EC2 pricing.

Hi, I'm reporting just a small bug, the prompts all say to use "scroll [direction=up|down]" but the parsing function expects "scroll [up|down]". This way the parsing always fails for the...

Hi, do we support huggingface models? If any internal internal tests were done with llama and the insights can be shared that will be great