SeeAct
SeeAct copied to clipboard
What is the meaning of an empty `pos_candidate`?
There are 761 rows in the HuggingFace dataset osunlp/Multimodal-Mind2Web that have an empty pos_candidate
.
The rows span across 497 tasks:
{'test_domain': 164, 'test_task': 47, 'test_website': 34, 'train': 252}
Here's a sample task that has an empty pos_candidate
in one of the steps: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web/viewer/default/train?q=6687eb6c-7154-4176-83a8-e841f78089d9 (row=1659)
It appears that src/data_utils/evaluation_utils.py
and src/offline_experiments/screenshot_generation/*.py
assume that an empty pos_candidates
implies the failure of the agent, and since "A task is regarded as successful only if all steps have succeeded," there could be a lack of clarity on what the accuracy gap of the "whole success rate" means in Table 4.