crab
crab copied to clipboard
[Feature Request] Integrate WebCanvas
Required prerequisites
- [X] I have searched the Issue Tracker that this hasn't already been reported. (+1 or comment there if it has.)
Motivation
WebCanvas: Benchmarking Web Agents in Online Environments is a advanced web agent benchmark framework that shares a similar idea with CRAB in some perspectives.
WebCanva provides three main components:
- A novel evaluation metric which reliably capture critical intermediate actions or states necessary for task completions while disregarding noise caused by insignificant events or changed web-elements.
- A benchmark dataset called Mind2Web-Live, a refined version of original Mind2Web static dataset containing 542 tasks with 2439 intermediate evaluation states.
- Lightweight and generalizable annotation tools and testing pipelines that enables the community to collect and maintain the high-quality, up-to-date dataset.
We should consider integrating WebCanvas dataset which is perfectly fit into CRAB.
Solution
No response
Additional context
No response