vllm [Bug]: using qwen-8B , LLVM ERROR: Failed to compute parent layout for slice layout

[Bug]: using qwen-8B , LLVM ERROR: Failed to compute parent layout for slice layout

Open suwenzhuo opened this issue 7 months ago • 2 comments

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

🐛 Describe the bug

vllm 0.8.5

vllm serve /root/model/Qwen3-8B --dtype half --port 8075 --gpu-memory-utilization 0.8

INFO: 115.239.217.175:36366 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 04-30 14:30:44 [engine.py:310] Added request chatcmpl-dbc9987ce4734f5b8321adfdb5ae22b7. LLVM ERROR: Failed to compute parent layout for slice layout. ERROR 04-30 14:30:50 [client.py:305] RuntimeError('Engine process (pid 2837204) died.')

Before submitting a new issue...

[x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Apr 30 '25 06:04 suwenzhuo

Have you tried v0?

VLLM_USE_V1=0 vllm serve ....

Apr 30 '25 07:04 jeejeelee

Similar issue: https://github.com/vllm-project/vllm/issues/17392

Apr 30 '25 07:04 jeejeelee

VLLM_USE_V1=0 is not working

docker run --gpus all -e VLLM_USE_V1=0 -v ~/.cache/huggingface:/root/.cache/huggingface
-p 8000:8000 --ipc=host vllm/vllm-openai:v0.8.5 --model Qwen/Qwen3-30B-A3B --tensor-parallel-size 4 --dtype=half --enable-reasoning --reasoning-parser deepseek_r1 --max-model-len 32768 --enforce-eager --no-enable-chunked-prefill --max-model-len 16384

May 01 '25 07:05 ShuhaoYuan

Try --dtype float32?

May 02 '25 07:05 houseroad

I save tihs by using --no-enable-chunked-prefill

examlpe: vllm serve /root/model/Qwen3-8B --dtype half --port 8075 --gpu-memory-utilization 0.8 --no-enable-chunked-prefill --max-model-len 8000

The possible reason is that v100 does not support chunked-prefill

May 06 '25 06:05 suwenzhuo

vllm vllm copied to clipboard

[Bug]: using qwen-8B , LLVM ERROR: Failed to compute parent layout for slice layout

Your current environment

🐛 Describe the bug

Before submitting a new issue...

vllm
vllm copied to clipboard