Qwen2.5-VL icon indicating copy to clipboard operation
Qwen2.5-VL copied to clipboard

Qwen2.5-vl, prompt for screenspot-pro evaluation

Open Ancolie18 opened this issue 2 weeks ago • 1 comments

When I tested qwen2-vl on Screenspot-Pro, the output format of the model remained stable as "<| object_def_start |>the selection<| object_def_end |><| box_start |>(501,10), (995987)<| box_dend |><| im_dend |>".

When I tried to verify qwen2.5-vl, I kept the prompt consistent with qwen2-vl, ( this prompt is provided by Screenspot-Pro)

prompt_origin = 'Output the bounding box in the image corresponding to the instruction "{}" with grounding.'
full_prompt = prompt_origin.format(instruction)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image_path,
            },
            {"type": "text", "text": full_prompt},
        ],
    }
]

but the model output format would be different, such as the following:

Image

Image

Image

Image

Do you have any recommended prompts for qwen2.5vl? How to ensure consistency of output for better evaluation?

Thank you!!

Ancolie18 avatar Feb 11 '25 06:02 Ancolie18