Jason J comments

Results 8 comments of


                                            Jason J

UI-TARS-72B-DPO在OSWorld基准测试复现中成功率0%且提前终止

--max_trajectory_length 15 启动的参数加上这个就可以了

OSWorld cannot parse the response

@ZFish-Lu It seems that the problem is caused by the input limit of VLM. I adjusted the size of the input history_n and the problem was solved. Maybe the problem...

OSWorld cannot parse the response

> 在OSWorld的测评代码uitars_agent.py中，当observation_type为screenshot的时候出现bug：Invalid observation_type type: screenshot 应该是这部分代码有些问题 > > 我是直接设置的observation的type为screenshot_a11_tree，在predict构建prompt的时候只传入了screenshot

OSWorld cannot parse the response

> 你现在能成功复现吗？我这边测试特别慢，结果还没出来

OSWorld cannot parse the response

> > 运行OSWorld的时候遇到如下bug，无论是vllm部署还是基于transformers部署都会出现如下错误，部署端日志显示应该是传的message格式有问题 > > > > [server.log](https://github.com/user-attachments/files/19961586/server.log) > > 是评测代码里message拼接有问题，已成功运行是的，我之前也是这个问题，应该写成dict的格式

android control test question

@manmushanhe Hello, I also encountered some problems when reproducing the sales results of this paper. Can you share your test code? Thank you very much.

android control test question

@manmushanhe Thank you very much for sharing the test code. It will be very helpful for me.

uitars的任务执行出问题

还有action space定义也是不一样，osworld当中的prompt定义是box的坐标，而ui-tars定义的是一个点的坐标，使用osworld当中的prompt返回还是一个坐标点而不是box（两个坐标）