[Bug]: when i use docker vllm/vllm-openai:v0.7.2 to deploy r1 awq, i got empty content
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
🐛 Describe the bug
device :8 * H100 python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 23333 --max-model-len 60000 --trust-remote-code --tensor-parallel-size 8 --quantization moe_wna16 --gpu-memory-utilization 0.92 --kv-cache-dtype fp8_e5m2 --calculate-kv-scales --served-model-name deepseek-reasoner --model cognitivecomputations/DeepSeek-R1-AWQ
curl http://localhost:23333/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "deepseek-reasoner", "messages": [ {"role": "user", "content": "你是谁"} ], "stream":true, "temperature":1.2 }' data: {"id":"chatcmpl-c7e88282efa547cfba27b429df7df593","object":"chat.completion.chunk","created":1739440234,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-c7e88282efa547cfba27b429df7df593","object":"chat.completion.chunk","created":1739440234,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-c7e88282efa547cfba27b429df7df593","object":"chat.completion.chunk","created":1739440234,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-c7e88282efa547cfba27b429df7df593","object":"chat.completion.chunk","created":1739440234,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-c7e88282efa547cfba27b429df7df593","object":"chat.completion.chunk","created":1739440234,"model":"deepseek-reasoner","choices":[{"index":0,"delta":{"content":""},"logprobs":null,"finish_reason":null}]}
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Hello, I got the same problem with 2 nodes of 4*A100 80G. Did you find a solution now?
same here on 8*A800 80G
vllm/vllm-openai:latest.
success on 8*A800 80G.
VLLM_WORKER_MULTIPROC_METHOD=spawn` vllm serve /cognitivecomputations/DeepSeek-R1-AWQ --host 0.0.0.0 --port 12345 --max-model-len 16384 --max-num-batched-tokens 16384 --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --dtype float16 --enable-reasoning --reasoning-parser deepseek_r1 --served-model-name deepseek-reasoner --enforce-eager
However, MLA is not supported. Output about 5 tokens per second.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!