SeeAct icon indicating copy to clipboard operation
SeeAct copied to clipboard

What is the meaning of an empty `pos_candidate`?

Open liaopeiyuan opened this issue 8 months ago • 1 comments

There are 761 rows in the HuggingFace dataset osunlp/Multimodal-Mind2Web that have an empty pos_candidate.

The rows span across 497 tasks:

{'test_domain': 164, 'test_task': 47, 'test_website': 34, 'train': 252}

Here's a sample task that has an empty pos_candidate in one of the steps: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web/viewer/default/train?q=6687eb6c-7154-4176-83a8-e841f78089d9 (row=1659)

It appears that src/data_utils/evaluation_utils.py and src/offline_experiments/screenshot_generation/*.py assume that an empty pos_candidates implies the failure of the agent, and since "A task is regarded as successful only if all steps have succeeded," there could be a lack of clarity on what the accuracy gap of the "whole success rate" means in Table 4.

liaopeiyuan avatar Jun 25 '24 05:06 liaopeiyuan