UI-TARS-desktop
UI-TARS-desktop copied to clipboard
[Bug]: At most 1 image(s) may be provided in one request; 400 Bad Request
Version
Agent-TARS-v1.0.0-alpha.7
Model
UI-TARS-7B-DPO
Deployment Method
Cloud
Issue Description
it just can carry out one-step demand, like 'open Chrome'. After finishing this, it always screenshotted again and then reported an error. Sometimes, it could not find the right location.
Error Logs
ERROR 04-09 17:23:46 [serving_chat.py:198] Error in preprocessing prompt inputs ERROR 04-09 17:23:46 [serving_chat.py:198] Traceback (most recent call last): ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 181, in create_chat_completion ERROR 04-09 17:23:46 [serving_chat.py:198] ) = await self._preprocess_chat( ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_engine.py", line 391, in _preprocess_chat ERROR 04-09 17:23:46 [serving_chat.py:198] conversation, mm_data_future = parse_chat_messages_futures( ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 1139, in parse_chat_messages_futures ERROR 04-09 17:23:46 [serving_chat.py:198] sub_messages = _parse_chat_message_content( ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 1067, in _parse_chat_message_content ERROR 04-09 17:23:46 [serving_chat.py:198] result = _parse_chat_message_content_parts( ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 967, in _parse_chat_message_content_parts ERROR 04-09 17:23:46 [serving_chat.py:198] parse_res = _parse_chat_message_content_part( ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 1024, in _parse_chat_message_content_part ERROR 04-09 17:23:46 [serving_chat.py:198] mm_parser.parse_image(str_content) ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 725, in parse_image ERROR 04-09 17:23:46 [serving_chat.py:198] placeholder = self._tracker.add("image", image_coro) ERROR 04-09 17:23:46 [serving_chat.py:198] File "/mnt/speech/ruiyu/miniconda3/envs/tars/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 548, in add ERROR 04-09 17:23:46 [serving_chat.py:198] raise ValueError( ERROR 04-09 17:23:46 [serving_chat.py:198] ValueError: At most 1 image(s) may be provided in one request. INFO: 127.0.0.1:32910 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
+1 same issue
Thank you for your detailed feedback and analysis.
The UI-TARS-7B-DPO model is currently designed to work only with UI-TARS Desktop and supports GUI Agent-related capabilities. It does not yet support integration with Agent-TARS, which is likely causing the issues you observed.
Some clarifications
These are two applications:
-
UI TARS Desktop is our first GUI Agent application focused on controlling computers, with its latest version being v0.0.8, which supports both Mac and Windows.
-
Agent TARS App is our new application focused on browser agents, with its latest version being Agent-TARS-v1.0.0-alpha.7. Since it is still in the
technical previewstage, it currently supports only Mac.
Suggestions
-
Use a model compatible with Agent-TARS: https://github.com/bytedance/UI-TARS-desktop/discussions/377
-
Future Development: Long-term, we plan to make UI-TARS models, integrate with Agent-TARS. This is under technical research, so stay tuned for updates.
UI-TARS-1.5-7B same issue