UI-TARS icon indicating copy to clipboard operation
UI-TARS copied to clipboard

How to pass action history to model

Open tlc4418 opened this issue 10 months ago • 5 comments

Hi, in your paper you mention that your model "takes as input the task instruction, the history of prior interactions (o1, a1, · · · , oi−1, ai−1), and the current observation oi". You have provided example prompts in the Readme for your model that include the task instruction and the current img observation, but these do not include the history of actions and previous observation screenshots. I am wondering how you pass in this information to your model (i.e. how it is trained to expect this information)?

TLDR: How do you pass in the history of previous actions and screenshot observations to your model for multi-step tasks? How do I integrate this to the provided prompts in the Readme?

tlc4418 avatar Jan 27 '25 13:01 tlc4418

I think you can keep the text of previous actions and dropped the images. In the UI-TARS desktop, the client always keeps at most 5 images due to the context limit.

AHEADer avatar Jan 28 '25 02:01 AHEADer

Hmm I tried doing this (in AndroidWorld), but the model does not perform very well (I expect it to do better based on the paper results). I am wondering if someone can share the prompt used for AndroidWorldif it is different to the one in the Readme? And how you add the history into this prompt?

tlc4418 avatar Jan 28 '25 15:01 tlc4418

Same question

cyh2004 avatar Feb 07 '25 11:02 cyh2004

Do you want to use Assistant in multiple rounds of conversations to store responses from the previous round?

Image

momianhua avatar Feb 08 '25 08:02 momianhua

same question

manmushanhe avatar Feb 21 '25 03:02 manmushanhe

Please refer to the example for guidance.

JjjFangg avatar May 07 '25 13:05 JjjFangg