vllm [Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048

[Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048

Open qiangruoyu opened this issue 1 week ago • 2 comments

Your current environment

env: 16*H800
model:deepseekr1
version:0.7.2
start scrpts:python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 80 --max-model-len 128000 --trust-remote-code --pipeline-parallel-size 2 --tensor-parallel-size 8 --gpu-memory-utilization 0.8  --served-model-name deepseek  --model /mnt/workspace/models/public-models/llm/DeepSeek-R1


“WARNING 02-16 15:37:32 scheduler.py:947] Input prompt (2501 tokens) is too long and exceeds limit of 2048
INFO 02-16 15:37:32 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 9.3 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
”

How would you like to use vllm

Which parameter of the startup command can solve the problem

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Feb 17 '25 03:02 qiangruoyu

vllm vllm copied to clipboard

[Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048

Your current environment

How would you like to use vllm

Before submitting a new issue...

vllm
vllm copied to clipboard