BrowserGym icon indicating copy to clipboard operation
BrowserGym copied to clipboard

WebArena Verified

Open NicolasAG opened this issue 3 weeks ago • 2 comments

Introduce the webarena_verified benchmark.

  • tasks are registered with this template: webarena_verified.{intent_template_id}.{task_id}
  • new WebArenaVerifiedTask class overrides the setup() function of GenericWebArenaTask to:
    • use the webarena_verified evaluator
    • load extra html headers if PW_EXTRA_HEADERS is set -- used to store secret keys to access self-hosted webarena instances
    • append a hint to the goal for the model to return the expected response format for the webarena_verified evaluator
  • new WebArenaVerifiedEvaluator class that calls the webarena_verified.api.WebArenaVerifiedEvaluator from platform-labs-webarena-verified

NicolasAG avatar Nov 06 '25 16:11 NicolasAG