UI-TARS OSWorld cannot parse the response

I pulled the latest osworld code and encountered the following problems when trying to reproduce the work

The project calls the chat.completions.create() method of OpenAI or similar interfaces.However, this method does not support the top_k parameter, so an error is reported.

2.There is no need to pass in the model parameter when initializing the Agent.

In the run_uitars.py, this parameter appears, and the problem is solved after commenting

After I solved the above two problems, when I started to reproduce the problem, a new problem appeared.

It seems that the response to the model cannot be parsed

Apr 22 '25 12:04 JYX1216

I've also encountered the same problem.

Apr 24 '25 12:04 ZFish-Lu

@ZFish-Lu It seems that the problem is caused by the input limit of VLM. I adjusted the size of the input history_n and the problem was solved. Maybe the problem can be solved by adjusting the max_pixels parameter, but it may affect the performance of the model

Apr 25 '25 02:04 JYX1216

在OSWorld的测评代码uitars_agent.py中，当observation_type为screenshot的时候出现bug：Invalid observation_type type: screenshot 应该是这部分代码有些问题

Apr 25 '25 07:04 ZFish-Lu

运行OSWorld的时候遇到如下bug，无论是vllm部署还是基于transformers部署都会出现如下错误，部署端日志显示应该是传的message格式有问题

server.log

Apr 29 '25 15:04 ZFish-Lu

在OSWorld的测评代码uitars_agent.py中，当observation_type为screenshot的时候出现bug：Invalid observation_type type: screenshot 应该是这部分代码有些问题

我是直接设置的observation的type为screenshot_a11_tree，在predict构建prompt的时候只传入了screenshot

Apr 30 '25 02:04 JYX1216

@ZFish-Lu It seems that the problem is caused by the input limit of VLM. I adjusted the size of the input history_n and the problem was solved. Maybe the problem can be solved by adjusting the max_pixels parameter, but it may affect the performance of the model

我还是有上述提到的bug，您现在能成功复现了吗？

Apr 30 '25 02:04 ZFish-Lu

你现在能成功复现吗？

我这边测试特别慢，结果还没出来

Apr 30 '25 03:04 JYX1216

运行OSWorld的时候遇到如下bug，无论是vllm部署还是基于transformers部署都会出现如下错误，部署端日志显示应该是传的message格式有问题
[server.log](https://github.com/user-attachments/files/19961586/server.log)

是评测代码里message拼接有问题，已成功运行

Apr 30 '25 06:04 ZFish-Lu

运行OSWorld的时候遇到如下bug，无论是vllm部署还是基于transformers部署都会出现如下错误，部署端日志显示应该是传的message格式有问题 server.log

是评测代码里message拼接有问题，已成功运行是的，我之前也是这个问题，应该写成dict的格式

Apr 30 '25 08:04 JYX1216

运行OSWorld的时候遇到如下bug，无论是vllm部署还是基于transformers部署都会出现如下错误，部署端日志显示应该是传的message格式有问题 server.log

是评测代码里message拼接有问题，已成功运行是的，我之前也是这个问题，应该写成dict的格式

message里的image应该以如下格式传入：

{ "role": "user", "content": [{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{encoded_string}"}}] }

May 01 '25 15:05 pooruss

运行OSWorld的时候遇到如下bug，无论是vllm部署还是基于transformers部署都会出现如下错误，部署端日志显示应该是传的message格式有问题 server.log

是评测代码里message拼接有问题，已成功运行

请问下 osworld 中 model_type 都是qwen25vl 您是vllm 部署的1.5 7B模型吗如果部署2B模型解析出来的点击位置不对的话是不是也会导致Exception in chrome/7b6c7e24-c58a-49fc-a5bb-d57b80e5b4c3: local variable 'response' referenced before assignment 这个问题呀谢谢

May 19 '25 08:05 chuheww

osworld上的实现目前发现有两处bug，已经提pr修复：https://github.com/xlang-ai/OSWorld/pull/194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误，点击位置是否正确需要把trace可视化出来检查

May 19 '25 08:05 pooruss

osworld上的实现目前发现有两处bug，已经提pr修复：xlang-ai/OSWorld#194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误，点击位置是否正确需要把trace可视化出来检查

您好我刚刚拉了您的run_uitars 和uitars_agent脚本本地vllm 部署的是2B-SFT模型，run_uitars中直接设置的observation的type为screenshot_a11_tree 然后报了local variable 'response' referenced before assignment 这个问题无法继续进行后续任务了您可以帮我看下这个问题吗

May 19 '25 09:05 chuheww

osworld上的实现目前发现有两处bug，已经提pr修复：xlang-ai/OSWorld#194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误，点击位置是否正确需要把trace可视化出来检查

您好，我的第一步可以正常生成结果且正确但是第二步就无法生成respone 直接跳到local variable 'response' referenced before assignment这个报错呢

message格式我拉的是您的为啥还是有错误呢 INFO: 127.0.0.1:52696 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 05-20 09:58:18 [loggers.py:111] Engine 000: Avg prompt throughput: 300.5 tokens/s, Avg generation throughput: 8.8 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 7.4% ERROR 05-20 09:58:18 [serving_chat.py:200] Error in preprocessing prompt inputs ERROR 05-20 09:58:18 [serving_chat.py:200] Traceback (most recent call last): ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/openai/serving_chat.py", line 183, in create_chat_completion ERROR 05-20 09:58:18 [serving_chat.py:200] ) = await self._preprocess_chat( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/openai/serving_engine.py", line 403, in _preprocess_chat ERROR 05-20 09:58:18 [serving_chat.py:200] conversation, mm_data_future = parse_chat_messages_futures( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/chat_utils.py", line 1165, in parse_chat_messages_futures ERROR 05-20 09:58:18 [serving_chat.py:200] sub_messages = _parse_chat_message_content( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/chat_utils.py", line 1089, in _parse_chat_message_content ERROR 05-20 09:58:18 [serving_chat.py:200] result = _parse_chat_message_content_parts( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/chat_utils.py", line 988, in _parse_chat_message_content_parts ERROR 05-20 09:58:18 [serving_chat.py:200] for part in parts: ERROR 05-20 09:58:18 [serving_chat.py:200] pydantic_core._pydantic_core.ValidationError: 2 validation errors for ValidatorIterator ERROR 05-20 09:58:18 [serving_chat.py:200] 0.ChatCompletionContentPartTextParam ERROR 05-20 09:58:18 [serving_chat.py:200] Input should be a valid dictionary [type=dict_type, input_value="Thought: 我看到当前...key(key='ctrl shift i')", input_type=str] ERROR 05-20 09:58:18 [serving_chat.py:200] For further information visit https://errors.pydantic.dev/2.11/v/dict_type ERROR 05-20 09:58:18 [serving_chat.py:200] 0.ChatCompletionContentPartRefusalParam ERROR 05-20 09:58:18 [serving_chat.py:200] Input should be a valid dictionary [type=dict_type, input_value="Thought: 我看到当前...key(key='ctrl shift i')", input_type=str] ERROR 05-20 09:58:18 [serving_chat.py:200] For further information visit https://errors.pydantic.dev/2.11/v/dict_type INFO: 127.0.0.1:52696 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

May 20 '25 01:05 chuheww

osworld上的实现目前发现有两处bug，已经提pr修复：xlang-ai/OSWorld#194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误，点击位置是否正确需要把trace可视化出来检查

历史消息扩展部分是不是也需要修改呀我改为这样可以解决问题 messages.append({ "role": "assistant", "content": [ {"type": "text", "text": add_box_token(history_response)} ] })

May 20 '25 02:05 chuheww

你现在能成功复现吗？

我这边测试特别慢，结果还没出来

你好，我这边无法复现，请问你有测试出来吗

Jun 19 '25 04:06 super-jw