crab icon indicating copy to clipboard operation
crab copied to clipboard

[Feature Request] Integrate WebCanvas

Open dandansamax opened this issue 5 months ago • 0 comments

Required prerequisites

  • [X] I have searched the Issue Tracker that this hasn't already been reported. (+1 or comment there if it has.)

Motivation

WebCanvas: Benchmarking Web Agents in Online Environments is a advanced web agent benchmark framework that shares a similar idea with CRAB in some perspectives.

WebCanva provides three main components:

  1. A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements.
  2. A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states.
  3. Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset.

We should consider integrating WebCanvas dataset which is perfectly fit into CRAB.

Solution

No response

Additional context

No response

dandansamax avatar Sep 05 '24 13:09 dandansamax