qwen32B hung when running 20K/12K w/ 4 GPU
Describe the bug when run DeepSeek-R1-Distill-Qwen-32B on 4 B60 GPU with 20K/12K, it will hang there even concurrency 1
How to reproduce Steps to reproduce the error:
-
MAX_NUM_BATCHED_TOKENS=${MAX_NUM_BATCHED_TOKENS:-40000} MAX_MODEL_LEN=${MAX_MODEL_LEN:-40000} export VLLM_USE_V1=1 python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server
--served-model-name $SERVED_MODEL_NAME
--port $PORT
--model $MODEL_PATH
--trust-remote-code
--block-size 64
--gpu-memory-utilization 0.95
--device xpu
--dtype float16
--enforce-eager
--load-in-low-bit $LOAD_IN_LOW_BIT
--max-model-len $MAX_MODEL_LEN
--max-num-batched-tokens $MAX_NUM_BATCHED_TOKENS
--max-num-seqs $MAX_NUM_SEQS
--tensor-parallel-size $TENSOR_PARALLEL_SIZE
--distributed-executor-backend ray -
for client input_length=20480 output_length=12288
for bsize in 1 2 4 8 10; do
echo "benchmark serving bs${bsize}"
python /llm/vllm/benchmarks/benchmark_serving.py
--model ${modelname}
--served-model-name ${servedname}
--dataset-name random
--trust_remote_code
--ignore-eos
--num_prompt $bsize
--random-input-len=$input_length
--random-output-len=$output_length
--port 8000
Screenshots
Environment information root@w05:/home/intel/ipex-llm/python/llm/scripts# bash env-check.sh
PYTHON_VERSION=3.11.13
[W617 13:56:29.995243467 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden. Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> () registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: XPU previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477 new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator()) [W617 13:56:31.952889367 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden. Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> () registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: XPU previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477 new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator()) transformers=4.52.4
[W617 13:56:38.056986411 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden. Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> () registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: XPU previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477 new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator()) [W617 13:56:41.725475380 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden. Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> () registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: XPU previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477 new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator()) torch=2.6.0+xpu
ipex-llm Version: 2.3.0b20250610
[W617 13:56:49.740651507 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden. Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> () registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: XPU previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477 new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator()) [W617 13:56:51.860952116 OperatorEntry.cpp:154] Warning: Warning only once for all operators, other operators may also be overridden. Overriding a previously registered kernel for the same operator and the same dispatch key operator: aten::_validate_compressed_sparse_indices(bool is_crow, Tensor compressed_idx, Tensor plain_idx, int cdim, int dim, int nnz) -> () registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6 dispatch key: XPU previous kernel: registered at /pytorch/build/aten/src/ATen/RegisterCPU.cpp:30477 new kernel: registered at /build/intel-pytorch-extension/build/Release/csrc/gpu/csrc/aten/generated/ATen/RegisterXPU.cpp:468 (function operator()) ipex=2.6.10+xpu
CPU Information: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Xeon(R) w7-3565X BIOS Model name: Intel(R) Xeon(R) w7-3565X CPU @ 2.5GHz BIOS CPU family: 179 CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 1 Stepping: 8
Total CPU Memory: 247.097 GB
Operating System: Ubuntu 24.04.1 LTS \n \l
Linux w05 6.14.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr 6 15:05:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
env-check.sh: line 148: xpu-smi: command not found
env-check.sh: line 154: clinfo: command not found
Driver related package version:
igpu not detected
xpu-smi is not installed. Please install xpu-smi according to README.md
Additional context Add any other context about the problem here.
This issue is caused by bmg card 0xe211, maybe we need to upgrade compute-runtime to latest in container to get the normal preformance.