webarena
webarena copied to clipboard
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Hey all! Getting started with webarena, and it holds a lot of promise! I'm getting this error when running ``` python run.py --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json --test_start_idx 0 --test_end_idx 3 --model gpt-3.5-turbo...
There's indeterminism in the shopping-admin website, in the best sellers report. To recreate, visit [this bestsellers report](http://ec2-3-131-244-37.us-east-2.compute.amazonaws.com:7780/admin/reports/report_sales/bestsellers/filter/cGVyaW9kX3R5cGU9bW9udGgmZnJvbT0xJTJGMSUyRjIyJnRvPTMlMkYzMSUyRjIyJnNob3dfZW1wdHlfcm93cz0w/ ) url and click "show report" several times. Each time it is clicked,...
The voting function is broken right now, if an upvote/downvote is applied, the voting will be reset to `-1`. @robert1003 recently fixed it. @frankxu2004 please coordinate and see what is...
We've recently started to onboard WebArena for evaluating AutoGen, and have encountered a persistent issue: GPT-4-based agents keep trying to visit the real Reddit website, and complain when they can't....
For typical drop down menu in the shopping and map environment, the webarena could not capture the content in the drop down menu due to playwright problem
Hi Shuyan, Thanks for creating and sharing this amazing environment! I've observed that the external link (that jumps to a new page) cannot be accessed via the 'click' action. An...
Dear authors, Thanks for your brilliant work. When I tried to use this benchmark, I found two possible problems: 1. if some action, e.g. clicking a link, will open a...
Hi, I have been testing on your wonderful datasets for months. Recently I plan to do more experiments on it. I do found that the _get_obs() takes a lot of...