SeeClick Prompts for other models

Hi, I am trying to compare models using ScreenSpot. What were the prompts you used for QwenVL, Fuyu, and CogAgent?

Sep 23 '24 16:09 likaixin2000

Hi, For CogAgent, we randomly chose three from their official prompts, as prompts = ["What steps do I need to take to \"{}\"?(with grounding)", "Can you advise me on how to \"{}\"?(with grounding)", "I'm looking for guidance on how to \"{}\".(with grounding)"] For fuyu, we determined the prompt based on discussions with the authors, as in https://huggingface.co/adept/fuyu-8b/discussions/42. Probably "When presented with a box, perform OCR to extract text contained within it. If provided with text, generate the corresponding bounding box.\n{}" For Qwen-VL, we follow their official example in GitHub, probably "Generate the bounding box of {}".

Sep 25 '24 02:09 njucckevin

Hi @njucckevin, Regarding CogAgent prompts, how do you decide on which prompt to use when testing it with Screenspot data? Do you actually just randomize betweeh those 3 prompts or there are several criteria for it? Thanks.

Nov 30 '24 10:11 krsx

Hi @njucckevin, Regarding CogAgent prompts, how do you decide on which prompt to use when testing it with Screenspot data? Do you actually just randomize betweeh those 3 prompts or there are several criteria for it? Thanks.

Hi,

There's no special criteria, it's just these three prompts picked at random. And these three prompts are the first three provided by the CogAgent official repo https://github.com/THUDM/CogVLM/blob/f7283b2c8d26cd7f932d9a5f7f5f9307f568195d/utils/utils/template.py#L761.

Dec 01 '24 13:12 njucckevin