JjjFangg comments

Results 65 comments of


                                            JjjFangg

any idea to release the curated tutorial dataset or tutorial clean/filtering code?

Sorry, we do not have plans for this at the moment.

Confuse about ActionSpace for MobileUse

That is correct. In the training of UI-TARS-1.5, we have optimized the action space for mobile scenarios, and you can directly use the latest prompt.

Will Chinese Thought yield better performance?

Yes, Chinese Thought yields better performance.

UITARS prompt for visual grounding only

Here are an example of visual grounding task. """You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next...

想针对特定领域的机器做适配微调，请教一下数据集应该如何构建

数据格式可以参考这个[样例](https://github.com/bytedance/UI-TARS/blob/main/data/training_example.json)，数据量不同领域都不太一样可以逐步scale根据实际效果来估计哈

UI-TARS-1.5-7B element locator accuracy significantly degraded

We have updated the [tutorial](https://github.com/bytedance/UI-TARS/blob/main/README_coordinates.md) on coordinate processing.

同一prompt多次请求，输出不同坐标结果。这是正常的吗？还是有什么bug？

To ensure consistent outputs for the same input, we recommend disabling the sampling options in the inference parameters.

UI-TARS-1.5-7B would not output bounding box

Yes, UI-TARS-1.5-7B has been trained to allow output only in the form of points.

The coordinate handling logic in UI-TARS-1.5 differs from that of UI-TARS. Please refer to the following [guide](https://github.com/xlang-ai/OSWorld/blob/main/mm_agents/uitars_agent.py) for deployment instructions.

JjjFangg