UI-TARS How to pass action history to model

Hi, in your paper you mention that your model "takes as input the task instruction, the history of prior interactions (o1, a1, · · · , oi−1, ai−1), and the current observation oi". You have provided example prompts in the Readme for your model that include the task instruction and the current img observation, but these do not include the history of actions and previous observation screenshots. I am wondering how you pass in this information to your model (i.e. how it is trained to expect this information)?

TLDR: How do you pass in the history of previous actions and screenshot observations to your model for multi-step tasks? How do I integrate this to the provided prompts in the Readme?

Jan 27 '25 13:01 tlc4418

I think you can keep the text of previous actions and dropped the images. In the UI-TARS desktop, the client always keeps at most 5 images due to the context limit.

Jan 28 '25 02:01 AHEADer

Hmm I tried doing this (in AndroidWorld), but the model does not perform very well (I expect it to do better based on the paper results). I am wondering if someone can share the prompt used for AndroidWorldif it is different to the one in the Readme? And how you add the history into this prompt?

Jan 28 '25 15:01 tlc4418

Same question

Feb 07 '25 11:02 cyh2004

Do you want to use Assistant in multiple rounds of conversations to store responses from the previous round?

Feb 08 '25 08:02 momianhua

same question

Feb 21 '25 03:02 manmushanhe

Please refer to the example for guidance.

May 07 '25 13:05 JjjFangg