JjjFangg

Results 65 comments of JjjFangg

That is correct. In the training of UI-TARS-1.5, we have optimized the action space for mobile scenarios, and you can directly use the latest prompt.

Yes, Chinese Thought yields better performance.

Here are an example of visual grounding task. """You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next...

数据格式可以参考这个[样例](https://github.com/bytedance/UI-TARS/blob/main/data/training_example.json),数据量不同领域都不太一样 可以逐步scale根据实际效果来估计哈

We have updated the [tutorial](https://github.com/bytedance/UI-TARS/blob/main/README_coordinates.md) on coordinate processing.

我们在本地推理的时候没有观察到类似情况,建议优先确认推理框架的问题

To ensure consistent outputs for the same input, we recommend disabling the sampling options in the inference parameters.

Yes, UI-TARS-1.5-7B has been trained to allow output only in the form of points.

The coordinate handling logic in UI-TARS-1.5 differs from that of UI-TARS. Please refer to the following [guide](https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/uitars_agent.py) for deployment instructions.