UI-TARS-desktop [Bug]: UI-TARS-2B-SFT model maximum context length Error

Version

v0.1.1

Model

UI-TARS-2B-SFT

Deployment Method

Local

Issue Description

start cmd CUDA_VISIBLE_DEVICES=0,1 python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --dtype=half --tensor-parallel-size 2 --trust-remote-code --model ./UI-TARS-2B-SFT/ --limit-mm-per-prompt "image=6" --gpu_memory_utilization 0.6
start chat
error log

Error Logs

ERROR 05-13 11:18:37 [serving_chat.py:200] Error in preprocessing prompt inputs ERROR 05-13 11:18:37 [serving_chat.py:200] Traceback (most recent call last): ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 183, in create_chat_completion ERROR 05-13 11:18:37 [serving_chat.py:200] ) = await self._preprocess_chat( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 439, in _preprocess_chat ERROR 05-13 11:18:37 [serving_chat.py:200] prompt_inputs = await self._tokenize_prompt_input_async( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/concurrent/futures/thread.py", line 58, in run ERROR 05-13 11:18:37 [serving_chat.py:200] result = self.fn(*self.args, **self.kwargs) ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 269, in _tokenize_prompt_input ERROR 05-13 11:18:37 [serving_chat.py:200] return next( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 292, in _tokenize_prompt_inputs ERROR 05-13 11:18:37 [serving_chat.py:200] yield self._normalize_prompt_text_to_input( ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 184, in _normalize_prompt_text_to_input ERROR 05-13 11:18:37 [serving_chat.py:200] return self._validate_input(request, input_ids, input_text) ERROR 05-13 11:18:37 [serving_chat.py:200] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-13 11:18:37 [serving_chat.py:200] File "/root/anaconda3/envs/unsloth_env/lib/python3.11/site-packages/vllm/entrypoints/openai/serving_engine.py", line 247, in _validate_input ERROR 05-13 11:18:37 [serving_chat.py:200] raise ValueError( ERROR 05-13 11:18:37 [serving_chat.py:200] ValueError: This model's maximum context length is 32768 tokens. However, you requested 65875 tokens (340 in the messages, 65535 in the completion). Please reduce the length of the messages or completion. INFO: 172.20.1.4:8170 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

May 13 '25 03:05 love01211

ERROR 05-13 11:18:37 [serving_chat.py:200] ValueError: This model's maximum context length is 32768 tokens. However, you requested 65875 tokens (340 in the messages, 65535 in the completion). Please reduce the length of the messages or completion tokens question...

May 17 '25 14:05 maxwell-feng

i see this error ,How to solve it @maxwell-feng

May 19 '25 01:05 love01211

Same issue here

May 28 '25 02:05 KJ-Chang

减少你的文字

May 29 '25 22:05 maxwell-feng

@maxwell-feng 大佬，如何减少文字，我这个是启动的时候就报错了，没有指定或者交互什么文字啊

May 30 '25 01:05 love01211

建议重新拉取。再次安装。不要改tokens

May 31 '25 17:05 maxwell-feng

he same question and how to fixed?

Jun 05 '25 01:06 huazhou007

he same question and how to fixed?

Not for now. I'm trying to re-download the model, but I don't think there's much chance of fixing it.

Jun 05 '25 02:06 love01211

Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:

Clone the project to your local environment:

git clone https://github.com/bytedance/ui-tars-desktop.git
cd ui-tars-desktop

Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line:

const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000;

Change 65535 to a value less than 32768 (I used 30000).
Then, follow the deployment steps in CONTRIBUTING.md:

pnpm install
pnpm run dev:ui-tars

Let me know if that works for you — hope it helps!

Jun 05 '25 02:06 KJ-Chang

he same question and how to fixed?

Not for now. I'm trying to re-download the model, but I don't think there's much chance of fixing it.

if you slove it please noticed

Jun 05 '25 02:06 huazhou007

Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
1. Clone the project to your local environment:
git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop
2. Open the file:
   [UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts](UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts)
   and locate the following line:
const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000;
3. Change 65535 to a value less than 32768 (I used 30000).

4. Then, follow the deployment steps in [CONTRIBUTING.md](UI-TARS-desktop/CONTRIBUTING.md):
pnpm install pnpm run dev:ui-tars

Let me know if that works for you — hope it helps!

How windows to do it?

Jun 05 '25 02:06 huazhou007

Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:
1. Clone the project to your local environment:
git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop
2. Open the file:
   [UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts](UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts)
   and locate the following line:
const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000;
3. Change 65535 to a value less than 32768 (I used 30000).

4. Then, follow the deployment steps in [CONTRIBUTING.md](UI-TARS-desktop/CONTRIBUTING.md):
pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!
How windows to do it?

I'm following this method on Windows.

Jun 05 '25 03:06 KJ-Chang

he same question and how to fixed?

Not for now. I'm trying to re-download the model, but I don't think there's much chance of fixing it.

if you slove it please noticed

Re-pulling the model didn't solve the issue, so I tried compiling using KJ-Chang method.

Jun 05 '25 03:06 love01211

Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:

Clone the project to your local environment:

git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop 2. Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line:

const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000; 3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in CONTRIBUTING.md:

pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!

Hello, I followed these steps and succeeded initially, but my task chain was relatively long. After executing for a while, it still reported an error and got stuck. May I ask what caused this?

Thanks

ValueError: This model's maximum context length is 32768 tokens. However, you requested 32819 tokens (2819 in the messages, 30000 in the completion). Please reduce the length of the messages or completion

Jun 23 '25 10:06 thecloveshine

Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:

Clone the project to your local environment:

git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop 2. Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line:

const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000; 3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in CONTRIBUTING.md:

pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!

Hello, I followed these steps and succeeded initially, but my task chain was relatively long. After executing for a while, it still reported an error and got stuck. May I ask what caused this?

Thanks

ValueError: This model's maximum context length is 32768 tokens. However, you requested 32819 tokens (2819 in the messages, 30000 in the completion). Please reduce the length of the messages or completion

Hi, I believe this error is similar to the previous one. The message indicates that there are 2,819 tokens in the messages and 30,000 in the completion, which adds up to 32,819 tokens — exceeding the model’s maximum context length of 32,768 tokens. So if you reduce the value you previously set to 30,000 to something a bit lower, it should work without any issues.

Jun 23 '25 12:06 KJ-Chang

Hi! I ran into a similar issue and found a workaround that worked for me — just in case it's helpful:

Clone the project to your local environment:

git clone https://github.com/bytedance/ui-tars-desktop.git cd ui-tars-desktop 2. Open the file: UI-TARS-desktop/packages/ui-tars/sdk/src/Model.ts and locate the following line: const max_tokens = uiTarsVersion == UITarsModelVersion.V1_5 ? 65535 : 1000; 3. Change 65535 to a value less than 32768 (I used 30000). 4. Then, follow the deployment steps in CONTRIBUTING.md: pnpm install pnpm run dev:ui-tars Let me know if that works for you — hope it helps!

Hello, I followed these steps and succeeded initially, but my task chain was relatively long. After executing for a while, it still reported an error and got stuck. May I ask what caused this? Thanks ValueError: This model's maximum context length is 32768 tokens. However, you requested 32819 tokens (2819 in the messages, 30000 in the completion). Please reduce the length of the messages or completion

Hi, I believe this error is similar to the previous one. The message indicates that there are 2,819 tokens in the messages and 30,000 in the completion, which adds up to 32,819 tokens — exceeding the model’s maximum context length of 32,768 tokens. So if you reduce the value you previously set to 30,000 to something a bit lower, it should work without any issues.

Thank you for your response. The issue has been resolved. My solution was to set max_token-len to 65535 when deploying and starting the model, and also adjust it to 65535 on the client side. This allowed my long task chain to execute smoothly and completely. Thank you agaIn.

Jun 24 '25 07:06 thecloveshine