UI-TARS-1.5-7B Endless Loops on Web Interfaces
We've observed that UI-TARS-1.5-7B agent frequently gets stuck in endless retry loops when interacting with web interfaces. The agent repeatedly attempts the same ineffective action, unable to adapt or learn new strategies.
Example of observed behavior:
Thought: I see a search button at the top of the page, it's the magnifying glass icon in the upper right corner. To find information about Apple Pencil, I need to click this search button to open the search box.
Action: click(start_box='(1099,64)')
Thought: I see a search button at the top of the page, it's the magnifying glass icon in the upper right corner. To find information about Apple Pencil, I need to click this search button to open the search box.
Action: click(start_box='(1099,64)')
Thought: I see a search button at the top of the page, it's the magnifying glass icon in the upper right corner. To find information about Apple Pencil, I need to click this search button to open the search box.
Action: click(start_box='(1099,64)')
[This exact same Thought and Action repeats 15x times without change]
We also need deployment guidelines for web operations, including:
- Recommended screen resolution settings
- Input prompt formatting and concatenation methods for web tasks
- Specific configuration steps for web deployment
Please provide relevant documentation or consider creating web-specific deployment guidelines.
We have provided documentation covering deployment and inference procedures. For questions 1 and 2, please refer to the following section of the UI-TARS repository: 👉 Quick Start Guide: Deploying and Using Our Model
For question 3 regarding configuration steps, you may refer to the UI-TARS-desktop deployment guide here: 👉 UI-TARS-Desktop Quick Start