[Bug]: UI-TARS-2B-SFT model maximum context length Error
Version
v0.1.1
Model
UI-TARS-2B-SFT
Deployment Method
Local
Issue Description
-
start cmd CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --dtype=half --tensor-parallel-size 2 --trust-remote-code --model ./UI-TARS-2B-SFT/ --limit-mm-per-prompt "image=6" --gpu_memory_utilization 0.6
-
start chat
-
error log
Error Logs
ERROR 05-13 11:18:37 [serving_chat.py:200] Error in preprocessing prompt inputs ERROR 05-13 11:18:37 [serving_chat.py:200] Traceback (most recent call last): ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 183, in create_chat_completion ERROR 05-13 11:18:37 [serving_chat.py:200] ) = await self._preprocess_chat( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 439, in _preprocess_chat ERROR 05-13 11:18:37 [serving_chat.py:200] prompt_inputs = await self._tokenize_prompt_input_async( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/concurrent/futures/thread.py", line 58, in run ERROR 05-13 11:18:37 [serving_chat.py:200] result = self.fn(*self.args, **self.kwargs) ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 269, in _tokenize_prompt_input ERROR 05-13 11:18:37 [serving_chat.py:200] return next( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 292, in _tokenize_prompt_inputs ERROR 05-13 11:18:37 [serving_chat.py:200] yield self._normalize_prompt_text_to_input( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 184, in _normalize_prompt_text_to_input ERROR 05-13 11:18:37 [serving_chat.py:200] return self._validate_input(request, input_ids, input_text) ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 247, in _validate_input ERROR 05-13 11:18:37 [serving_chat.py:200] raise ValueError( ERROR 05-13 11:18:37 [serving_chat.py:200] ValueError: This model's maximum context length is 32768 tokens. However, you requested 65875 tokens (340 in the messages, 65535 in the completion). Please reduce the length of the messages or completion. INFO: 172.20.1.4:8170 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
ERROR 05-13 11:18:37 [serving_chat.py:200] ValueError: This model's maximum context length is 32768 tokens. However, you requested 65875 tokens (340 in the messages, 65535 in the completion). Please reduce the length of the messages or completion tokens question...
i see this error ,How to solve it @maxwell-feng
Same issue here
减少你的文字
@maxwell-feng 大佬,如何减少文字,我这个是启动的时候就报错了,没有指定或者交互什么文字啊
建议重新拉取。再次安装。不要改tokens
he same question and how to fixed?
he same question and how to fixed?
Not for now. I'm trying to re-download the model, but I don't think there's much chance of fixing it.
Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
- Clone the project to your local environment:
git clone https://github.com/bytedance/ui-tars-desktop.git
cd ui-tars-desktop
- Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line:
const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000;
-
Change 65535 to a value less than 32768 (I used 30000).
-
Then, follow the deployment steps in CONTRIBUTING.md:
pnpm install
pnpm run dev:ui-tars
Let me know if that works for you — hope it helps!
he same question and how to fixed?
Not for now. I'm trying to re-download the model, but I don't think there's much chance of fixing it.
if you slove it please noticed
Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
1. Clone the project to your local environment:git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop
2. Open the file: [UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts](UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts) and locate the following line:const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000;
3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in [CONTRIBUTING.md](UI-TARS-desktop/CONTRIBUTING.md):pnpm install pnpm run dev:ui-tars
Let me know if that works for you — hope it helps!
How windows to do it?
Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
1. Clone the project to your local environment:git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop
2. Open the file: [UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts](UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts) and locate the following line:const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000;
3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in [CONTRIBUTING.md](UI-TARS-desktop/CONTRIBUTING.md):pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!
How windows to do it?
I'm following this method on Windows.
he same question and how to fixed?
Not for now. I'm trying to re-download the model, but I don't think there's much chance of fixing it.
if you slove it please noticed
Re-pulling the model didn't solve the issue, so I tried compiling using KJ-Chang method.
Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
- Clone the project to your local environment:
git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop 2. Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line:
const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000; 3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in CONTRIBUTING.md:
pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!
Hello, I followed these steps and succeeded initially, but my task chain was relatively long. After executing for a while, it still reported an error and got stuck. May I ask what caused this?
Thanks
ValueError: This model's maximum context length is 32768 tokens. However, you requested 32819 tokens (2819 in the messages, 30000 in the completion). Please reduce the length of the messages or completion
Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
- Clone the project to your local environment:
git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop 2. Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line:
const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000; 3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in CONTRIBUTING.md:
pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!
Hello, I followed these steps and succeeded initially, but my task chain was relatively long. After executing for a while, it still reported an error and got stuck. May I ask what caused this?
Thanks
ValueError: This model's maximum context length is 32768 tokens. However, you requested 32819 tokens (2819 in the messages, 30000 in the completion). Please reduce the length of the messages or completion
Hi, I believe this error is similar to the previous one. The message indicates that there are 2,819 tokens in the messages and 30,000 in the completion, which adds up to 32,819 tokens — exceeding the model’s maximum context length of 32,768 tokens. So if you reduce the value you previously set to 30,000 to something a bit lower, it should work without any issues.
Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
- Clone the project to your local environment:
git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop 2. Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line: const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000; 3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in CONTRIBUTING.md: pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!
Hello, I followed these steps and succeeded initially, but my task chain was relatively long. After executing for a while, it still reported an error and got stuck. May I ask what caused this? Thanks ValueError: This model's maximum context length is 32768 tokens. However, you requested 32819 tokens (2819 in the messages, 30000 in the completion). Please reduce the length of the messages or completion
Hi, I believe this error is similar to the previous one. The message indicates that there are 2,819 tokens in the messages and 30,000 in the completion, which adds up to 32,819 tokens — exceeding the model’s maximum context length of 32,768 tokens. So if you reduce the value you previously set to 30,000 to something a bit lower, it should work without any issues.
Thank you for your response. The issue has been resolved. My solution was to set max_token-len to 65535 when deploying and starting the model, and also adjust it to 65535 on the client side. This allowed my long task chain to execute smoothly and completely. Thank you agaIn.