UI-TARS icon indicating copy to clipboard operation
UI-TARS copied to clipboard

OSWorld cannot parse the response

Open JYX1216 opened this issue 8 months ago • 10 comments

I pulled the latest osworld code and encountered the following problems when trying to reproduce the work

  1. The project calls the chat.completions.create() method of OpenAI or similar interfaces.However, this method does not support the top_k parameter, so an error is reported.
Image

2.There is no need to pass in the model parameter when initializing the Agent.

Image

In the run_uitars.py, this parameter appears, and the problem is solved after commenting

Image
  1. After I solved the above two problems, when I started to reproduce the problem, a new problem appeared.
Image

It seems that the response to the model cannot be parsed

JYX1216 avatar Apr 22 '25 12:04 JYX1216

I've also encountered the same problem.

ZFish-Lu avatar Apr 24 '25 12:04 ZFish-Lu

@ZFish-Lu It seems that the problem is caused by the input limit of VLM. I adjusted the size of the input history_n and the problem was solved. Maybe the problem can be solved by adjusting the max_pixels parameter, but it may affect the performance of the model

JYX1216 avatar Apr 25 '25 02:04 JYX1216

在OSWorld的测评代码uitars_agent.py中,当observation_type为screenshot的时候出现bug:Invalid observation_type type: screenshot 应该是这部分代码有些问题

Image

ZFish-Lu avatar Apr 25 '25 07:04 ZFish-Lu

运行OSWorld的时候遇到如下bug,无论是vllm部署还是基于transformers部署都会出现如下错误,部署端日志显示应该是传的message格式有问题

Image

server.log

ZFish-Lu avatar Apr 29 '25 15:04 ZFish-Lu

在OSWorld的测评代码uitars_agent.py中,当observation_type为screenshot的时候出现bug:Invalid observation_type type: screenshot 应该是这部分代码有些问题

Image

我是直接设置的observation的type为screenshot_a11_tree,在predict构建prompt的时候只传入了screenshot

JYX1216 avatar Apr 30 '25 02:04 JYX1216

@ZFish-Lu It seems that the problem is caused by the input limit of VLM. I adjusted the size of the input history_n and the problem was solved. Maybe the problem can be solved by adjusting the max_pixels parameter, but it may affect the performance of the model

我还是有上述提到的bug,您现在能成功复现了吗?

ZFish-Lu avatar Apr 30 '25 02:04 ZFish-Lu

你现在能成功复现吗?

我这边测试特别慢,结果还没出来

JYX1216 avatar Apr 30 '25 03:04 JYX1216

运行OSWorld的时候遇到如下bug,无论是vllm部署还是基于transformers部署都会出现如下错误,部署端日志显示应该是传的message格式有问题

Image [server.log](https://github.com/user-attachments/files/19961586/server.log)

是评测代码里message拼接有问题,已成功运行

ZFish-Lu avatar Apr 30 '25 06:04 ZFish-Lu

运行OSWorld的时候遇到如下bug,无论是vllm部署还是基于transformers部署都会出现如下错误,部署端日志显示应该是传的message格式有问题 Image server.log

是评测代码里message拼接有问题,已成功运行 是的,我之前也是这个问题,应该写成dict的格式

JYX1216 avatar Apr 30 '25 08:04 JYX1216

运行OSWorld的时候遇到如下bug,无论是vllm部署还是基于transformers部署都会出现如下错误,部署端日志显示应该是传的message格式有问题 Image server.log

是评测代码里message拼接有问题,已成功运行 是的,我之前也是这个问题,应该写成dict的格式

message里的image应该以如下格式传入:

{ "role": "user", "content": [{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{encoded_string}"}}] }

pooruss avatar May 01 '25 15:05 pooruss

运行OSWorld的时候遇到如下bug,无论是vllm部署还是基于transformers部署都会出现如下错误,部署端日志显示应该是传的message格式有问题 Image server.log

是评测代码里message拼接有问题,已成功运行

请问下 osworld 中 model_type 都是qwen25vl 您是vllm 部署的1.5 7B模型吗 如果部署2B模型解析出来的点击位置不对的话是不是也会导致Exception in chrome/7b6c7e24-c58a-49fc-a5bb-d57b80e5b4c3: local variable 'response' referenced before assignment 这个问题呀 谢谢

chuheww avatar May 19 '25 08:05 chuheww

osworld上的实现目前发现有两处bug,已经提pr修复:https://github.com/xlang-ai/OSWorld/pull/194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误,点击位置是否正确需要把trace可视化出来检查

pooruss avatar May 19 '25 08:05 pooruss

osworld上的实现目前发现有两处bug,已经提pr修复:xlang-ai/OSWorld#194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误,点击位置是否正确需要把trace可视化出来检查

您好 我刚刚拉了您的run_uitars 和uitars_agent脚本 本地vllm 部署的是2B-SFT模型 ,run_uitars中直接设置的observation的type为screenshot_a11_tree 然后报了local variable 'response' referenced before assignment 这个问题 无法继续进行后续任务了 您可以帮我看下这个问题吗

Image

chuheww avatar May 19 '25 09:05 chuheww

osworld上的实现目前发现有两处bug,已经提pr修复:xlang-ai/OSWorld#194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误,点击位置是否正确需要把trace可视化出来检查

您好,我的第一步可以正常生成结果 且正确 但是第二步 就无法生成respone 直接跳到local variable 'response' referenced before assignment这个报错呢

Image

Image message格式我拉的是您的 为啥还是有错误呢 INFO: 127.0.0.1:52696 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 05-20 09:58:18 [loggers.py:111] Engine 000: Avg prompt throughput: 300.5 tokens/s, Avg generation throughput: 8.8 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 7.4% ERROR 05-20 09:58:18 [serving_chat.py:200] Error in preprocessing prompt inputs ERROR 05-20 09:58:18 [serving_chat.py:200] Traceback (most recent call last): ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/openai/serving_chat.py", line 183, in create_chat_completion ERROR 05-20 09:58:18 [serving_chat.py:200] ) = await self._preprocess_chat( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/openai/serving_engine.py", line 403, in _preprocess_chat ERROR 05-20 09:58:18 [serving_chat.py:200] conversation, mm_data_future = parse_chat_messages_futures( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/chat_utils.py", line 1165, in parse_chat_messages_futures ERROR 05-20 09:58:18 [serving_chat.py:200] sub_messages = _parse_chat_message_content( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/chat_utils.py", line 1089, in _parse_chat_message_content ERROR 05-20 09:58:18 [serving_chat.py:200] result = _parse_chat_message_content_parts( ERROR 05-20 09:58:18 [serving_chat.py:200] File "/home/hello/miniforge3/envs/osworld1/lib/python3.9/site-packages/vllm/entrypoints/chat_utils.py", line 988, in _parse_chat_message_content_parts ERROR 05-20 09:58:18 [serving_chat.py:200] for part in parts: ERROR 05-20 09:58:18 [serving_chat.py:200] pydantic_core._pydantic_core.ValidationError: 2 validation errors for ValidatorIterator ERROR 05-20 09:58:18 [serving_chat.py:200] 0.ChatCompletionContentPartTextParam ERROR 05-20 09:58:18 [serving_chat.py:200] Input should be a valid dictionary [type=dict_type, input_value="Thought: 我看到当前...key(key='ctrl shift i')", input_type=str] ERROR 05-20 09:58:18 [serving_chat.py:200] For further information visit https://errors.pydantic.dev/2.11/v/dict_type ERROR 05-20 09:58:18 [serving_chat.py:200] 0.ChatCompletionContentPartRefusalParam ERROR 05-20 09:58:18 [serving_chat.py:200] Input should be a valid dictionary [type=dict_type, input_value="Thought: 我看到当前...key(key='ctrl shift i')", input_type=str] ERROR 05-20 09:58:18 [serving_chat.py:200] For further information visit https://errors.pydantic.dev/2.11/v/dict_type INFO: 127.0.0.1:52696 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

chuheww avatar May 20 '25 01:05 chuheww

osworld上的实现目前发现有两处bug,已经提pr修复:xlang-ai/OSWorld#194

另外local variable 'response' referenced before assignment这个报错是非法python代码的错误,点击位置是否正确需要把trace可视化出来检查

历史消息扩展部分 是不是也需要修改呀 我改为这样可以解决问题 messages.append({ "role": "assistant", "content": [ {"type": "text", "text": add_box_token(history_response)} ] })

chuheww avatar May 20 '25 02:05 chuheww

你现在能成功复现吗?

我这边测试特别慢,结果还没出来

你好,我这边无法复现,请问你有测试出来吗

super-jw avatar Jun 19 '25 04:06 super-jw