How to pass action history to model
Hi, in your paper you mention that your model "takes as input the task instruction, the history of prior interactions (o1, a1, · · · , oi−1, ai−1), and the current observation oi". You have provided example prompts in the Readme for your model that include the task instruction and the current img observation, but these do not include the history of actions and previous observation screenshots. I am wondering how you pass in this information to your model (i.e. how it is trained to expect this information)?
TLDR: How do you pass in the history of previous actions and screenshot observations to your model for multi-step tasks? How do I integrate this to the provided prompts in the Readme?
I think you can keep the text of previous actions and dropped the images. In the UI-TARS desktop, the client always keeps at most 5 images due to the context limit.
Hmm I tried doing this (in AndroidWorld), but the model does not perform very well (I expect it to do better based on the paper results). I am wondering if someone can share the prompt used for AndroidWorldif it is different to the one in the Readme? And how you add the history into this prompt?
Same question
Do you want to use Assistant in multiple rounds of conversations to store responses from the previous round?
same question
Please refer to the example for guidance.