UI-TARS UI-TARS-72B-DPO在OSWorld基准测试复现中成功率0%且提前终止

环境配置：

模型版本：UI-TARS-72B-DPO 运行脚本：官方run_uitars.py 实验设置：pyautogui + screenshot_a11y_tree 硬件配置：4卡A6000 (48GB/卡) 依赖版本：vLLM 0.7.3

在复现OSWorld基准测试时，目前测试运行了test_small.json的全部样本，模型在3步内主动输出"FAIL"终止任务，成功率0%。具体表现为输出乱码并且所有样本均在≤3步时放弃。

Mar 17 '25 05:03 Dizzy-K

是不是最大步数没有设置，默认为3了

Mar 17 '25 07:03 Asot2887

max_trajectory_length 不应该是模型每次看到最近的3步嘛

Mar 17 '25 14:03 SunzeY

请问有解决吗，遇到了一样的问题

Apr 13 '25 04:04 spidercatfly

--max_trajectory_length 15 启动的参数加上这个就可以了

Apr 22 '25 08:04 JYX1216

Hi, I wonder whether you have reproduced the results using UI-TARS-72B? During my reproduction process, I noticed the model always repeat the same action and the performance is inconsistent with the paper.

Jun 17 '25 02:06 super-jw