BrowserGym
BrowserGym copied to clipboard
WebArena Verified
Introduce the webarena_verified benchmark.
- tasks are registered with this template:
webarena_verified.{intent_template_id}.{task_id} - new
WebArenaVerifiedTaskclass overrides thesetup()function ofGenericWebArenaTaskto:- use the webarena_verified evaluator
- load extra html headers if
PW_EXTRA_HEADERSis set -- used to store secret keys to access self-hosted webarena instances - append a hint to the
goalfor the model to return the expected response format for the webarena_verified evaluator
- new
WebArenaVerifiedEvaluatorclass that calls thewebarena_verified.api.WebArenaVerifiedEvaluatorfrom platform-labs-webarena-verified