BrowserGym
BrowserGym copied to clipboard
🌎💪 BrowserGym, a Gym environment for web task automation
When running on the test split (381/381, i think task id 811 but not sure), it hangs at the following stages: > 2024-12-17 02:32:41,293 - 345050 - browsergym.experiments.loop - INFO...
GAIA
- add gaia and gaia eval (based on assistantbench pr - https://github.com/ServiceNow/BrowserGym/pull/186/) - refactor writing predictions to jsonl to a utils file - fix assistantbench readme
## Pull Request: Integrate WebCanvas Key Node Evaluation and Mind2web-live Benchmark into BrowserGym ### Description This PR officially integrates the **WebCanvas key node evaluation** and the **Mind2web-live benchmark** into **BrowserGym**....
Moving version fetching to BrowserGym. It feels too hardcoded atm, maybe we could deduce the pkg name automatically ? Maybe some kind of regex on the pkg name
Hi, thanks for the project! I'm trying to implement and experiment with coordinate-based actions from `browsergym` and it would be useful if the environment exposes this info via the observation....
There is an issue with some WebArena shopping tasks: - On task 275: it's a search task where the agent is asked to search for "xbox". So the reference URL...
Introduce the webarena_verified benchmark. - tasks are registered with this template: `webarena_verified.{intent_template_id}.{task_id}` - new `WebArenaVerifiedTask` class overrides the `setup()` function of `GenericWebArenaTask` to: - use the webarena_verified evaluator - load...
Hi, In the following step, the action `mouse_click(1219, 228, button='left')` is not executed properly on WorkArena L1, even if the blue cursor shows the mouse is on the right element....