MobileAgent script for evaluation & prompt for ScreenSpot grounding

Can you release the script for evaluation? What is the prompt for evaluating ScreenSpot-v2 and Android Control? Thanks for your reply.

Aug 25 '25 12:08 jangyicheng

Hi, we are still refining the evaluation framework code to remove dependencies on some internal packages. Here, I've copied the main part of the grounding evaluation code, which should be enough to reproduce the experimental results.

Grounding Task https://gist.github.com/LukeForeverYoung/5596191c837a930dfffacdb2e3dc8ac0

Android Control For android control, we follow qwen2.5vl and use this script for evaluation. https://gist.github.com/LukeForeverYoung/1f5d19495788de0d905c5ac6341153f5

The main difference of evaluation on android control is that we use a multi-round multi-image format to organize the context, which can be found in our cookbook.

Aug 26 '25 02:08 LukeForeverYoung

Thank you for your reply.Could you clarify what "multi-image" refers to in the term "multi-image format"? Does it mean historical screenshots from multi-turn interactions?

Aug 30 '25 10:08 jangyicheng