script for evaluation & prompt for ScreenSpot grounding
Can you release the script for evaluation? What is the prompt for evaluating ScreenSpot-v2 and Android Control? Thanks for your reply.
Hi, we are still refining the evaluation framework code to remove dependencies on some internal packages. Here, I've copied the main part of the grounding evaluation code, which should be enough to reproduce the experimental results.
Grounding Task https://gist.github.com/LukeForeverYoung/5596191c837a930dfffacdb2e3dc8ac0
Android Control For android control, we follow qwen2.5vl and use this script for evaluation. https://gist.github.com/LukeForeverYoung/1f5d19495788de0d905c5ac6341153f5
The main difference of evaluation on android control is that we use a multi-round multi-image format to organize the context, which can be found in our cookbook.
Thank you for your reply.Could you clarify what "multi-image" refers to in the term "multi-image format"? Does it mean historical screenshots from multi-turn interactions?