UI-TARS icon indicating copy to clipboard operation
UI-TARS copied to clipboard

UI-TARS-72B-DPO在OSWorld基准测试复现中成功率0%且提前终止

Open Dizzy-K opened this issue 9 months ago • 4 comments

环境配置:

模型版本:UI-TARS-72B-DPO 运行脚本:官方run_uitars.py 实验设置:pyautogui + screenshot_a11y_tree 硬件配置:4卡A6000 (48GB/卡) 依赖版本:vLLM 0.7.3

在复现OSWorld基准测试时,目前测试运行了test_small.json的全部样本,模型在3步内主动输出"FAIL"终止任务,成功率0%。具体表现为输出乱码并且所有样本均在≤3步时放弃。

Image

Dizzy-K avatar Mar 17 '25 05:03 Dizzy-K

是不是最大步数没有设置,默认为3了

Asot2887 avatar Mar 17 '25 07:03 Asot2887

max_trajectory_length 不应该是模型每次看到最近的3步嘛

SunzeY avatar Mar 17 '25 14:03 SunzeY

请问有解决吗,遇到了一样的问题

spidercatfly avatar Apr 13 '25 04:04 spidercatfly

--max_trajectory_length 15 启动的参数加上这个就可以了

JYX1216 avatar Apr 22 '25 08:04 JYX1216

Hi, I wonder whether you have reproduced the results using UI-TARS-72B? During my reproduction process, I noticed the model always repeat the same action and the performance is inconsistent with the paper.

super-jw avatar Jun 17 '25 02:06 super-jw