Agent-S icon indicating copy to clipboard operation
Agent-S copied to clipboard

Mobile Device Evaluations -- AndroidControl, GuiOdyssey, et al.

Open ckgresla opened this issue 6 months ago • 3 comments

Hello!

Awesome X announcement that you folks put out for Simular and kudos on the Agent S & Agent S2 papers.

I was curious about the performance of the Agent S2 system on some notable, static offline datasets for Android device manipulation. While AndroidWorld provides one useful signal for a system's capability to operate mobile phones, there is a wide variety of tasks that are not captured in it's distribution, which I believe are present among other open source device manipulation datasets, such as:

  1. AndroidControl
  2. AMEX
  3. GUIOdyssey

Would it be possible to evaluate the Agent S2 system on these data sources? This question is complementary to this issue related to the evaluation setup for AndroidWorld.

ckgresla avatar Jun 24 '25 17:06 ckgresla

following

beibidesr avatar Jul 04 '25 08:07 beibidesr

🥇

chethanuk avatar Oct 04 '25 21:10 chethanuk

I think that the community would also be curious about the artifacts for the latest Agent S3 model. Would it be possible to make available some of the inference results? Like trajectories of model interaction with the environments (screenshots with the annotated actions from each step).

ckgresla avatar Oct 07 '25 23:10 ckgresla