vllm
vllm copied to clipboard
[Bug]: vllm 0.4.0.post1 crashed when loading dbrx-instruct on AMD MI250x
Your current environment
- vllm (commit
db2a6a41e206abecf4128aba25117fcaf7bebe12
) + ROCm 6.0 Docker image built with the fix of Dockerfile.rocm - 4x AMD MI250x GPUs (each MI250x has 2 GPU dies, total 512GB GPU memory)
- model: databricks/dbrx-instruct
🐛 Describe the bug
Ran vllm Docker image with docker run --network=host --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 32G --device /dev/kfd --device /dev/dri -v $model_dir:/app/model vllm-rocm:v0.4.0.post1 python -m vllm.entrypoints.openai.api_server --port 7860 --model /app/model/models--databricks--dbrx-instruct/snapshots/17365204e9cf13e2296ee984c1ab48071e861efa --trust-remote-code --tensor-parallel-size 8
The vllm server crashed soon after loading the model.
INFO 04-05 23:49:30 llm_engine.py:81] Initializing an LLM engine (v0.4.0.post1) with config: model='/app/model/models--databricks--dbrx-instruct/snapshots/17365204e9cf13e2296ee
984c1ab48071e861efa', speculative_config=None, tokenizer='/app/model/models--databricks--dbrx-instruct/snapshots/17365204e9cf13e2296ee984c1ab48071e861efa', tokenizer_mode=auto,
revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=8, disable_c
ustom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 04-05 23:49:31 tokenizer.py:104] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 04-05 23:49:46 pynccl.py:58] Loading nccl from library librccl.so.1
INFO 04-05 23:49:46 selector.py:34] Cannot use FlashAttention backend for AMD GPUs.
INFO 04-05 23:49:46 selector.py:25] Using XFormers backend.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.1.1+git011de5c)
Python 3.9.18 (you have 3.9.18)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
(RayWorkerVllm pid=5498) WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
(RayWorkerVllm pid=5498) PyTorch 2.1.1+cu121 with CUDA 1201 (you have 2.1.1+git011de5c)
(RayWorkerVllm pid=5498) Python 3.9.18 (you have 3.9.18)
(RayWorkerVllm pid=5498) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
(RayWorkerVllm pid=5498) Memory-efficient attention, SwiGLU, sparse and more won't be available.
(RayWorkerVllm pid=5498) Set XFORMERS_MORE_DETAILS=1 for more details
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
warnings.warn("Can't initialize NVML")
(RayWorkerVllm pid=5498) /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/cuda/__init__.py:611: UserWarning: Can't initialize NVML
(RayWorkerVllm pid=5498) warnings.warn("Can't initialize NVML")
(RayWorkerVllm pid=5498) INFO 04-05 23:49:47 pynccl.py:58] Loading nccl from library librccl.so.1
(RayWorkerVllm pid=5498) INFO 04-05 23:49:47 selector.py:34] Cannot use FlashAttention backend for AMD GPUs.
(RayWorkerVllm pid=5498) INFO 04-05 23:49:47 selector.py:25] Using XFormers backend.
(RayWorkerVllm pid=5498) INFO 04-05 23:49:48 pynccl_utils.py:45] vLLM is using nccl==2.18.3
INFO 04-05 23:49:49 pynccl_utils.py:45] vLLM is using nccl==2.18.3
INFO 04-05 23:50:13 model_runner.py:104] Loading model weights took 30.6567 GB
error: LLVM Translation failed for operation: builtin.unrealized_conversion_cast
Failed to emit LLVM IR
Translate to LLVM IR failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.
*** SIGABRT received at time=1712361058 on cpu 4 ***
PC: @ 0x7f47c6d6b00b (unknown) raise
@ 0x7f47c7088420 (unknown) (unknown)
@ 0x100000000 (unknown) (unknown)
@ ... and at least 1 more frames
[2024-04-05 23:50:58,762 E 1 1] logging.cc:361: *** SIGABRT received at time=1712361058 on cpu 4 ***
[2024-04-05 23:50:58,762 E 1 1] logging.cc:361: PC: @ 0x7f47c6d6b00b (unknown) raise
[2024-04-05 23:50:58,763 E 1 1] logging.cc:361: @ 0x7f47c7088420 (unknown) (unknown)
[2024-04-05 23:50:58,765 E 1 1] logging.cc:361: @ 0x100000000 (unknown) (unknown)
[2024-04-05 23:50:58,765 E 1 1] logging.cc:361: @ ... and at least 1 more frames
Fatal Python error: Aborted
Stack (most recent call first):
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/triton/compiler/compiler.py", line 114 in ttgir_to_llir
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/triton/compiler/compiler.py", line 417 in <lambda>
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/triton/compiler/compiler.py", line 509 in compile
File "<string>", line 63 in fused_moe_kernel
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 222 in invok
e_fused_moe_kernel
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 397 in fused
_moe
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 148 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 302 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 338 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 377 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/worker/model_runner.py", line 683 in execute_model
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/worker/model_runner.py", line 762 in profile_run
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/worker/worker.py", line 131 in profile_num_available_blocks
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 328 in _run_workers
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 224 in _init_cache
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 69 in __init__
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/llm_engine.py", line 119 in __init__
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 421 in _init_engine
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 311 in __init__
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 347 in from_engine_args
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line 157 in <module>
File "/opt/conda/envs/py_3.9/lib/python3.9/runpy.py", line 87 in _run_code
File "/opt/conda/envs/py_3.9/lib/python3.9/runpy.py", line 197 in _run_module_as_main
[failure_signal_handler.cc : 332] RAW: Signal 11 raised at PC=0x7f47c6d4a941 while already in AbslFailureSignalHandler()
*** SIGSEGV received at time=1712361058 on cpu 4 ***
PC: @ 0x7f47c6d4a941 (unknown) abort
@ 0x7f47c7088420 (unknown) (unknown)
@ 0x100000000 (unknown) (unknown)
@ ... and at least 1 more frames
[2024-04-05 23:50:58,768 E 1 1] logging.cc:361: *** SIGSEGV received at time=1712361058 on cpu 4 ***
[2024-04-05 23:50:58,768 E 1 1] logging.cc:361: PC: @ 0x7f47c6d4a941 (unknown) abort
[2024-04-05 23:50:58,769 E 1 1] logging.cc:361: @ 0x7f47c7088420 (unknown) (unknown)
[2024-04-05 23:50:58,770 E 1 1] logging.cc:361: @ 0x100000000 (unknown) (unknown)
[2024-04-05 23:50:58,770 E 1 1] logging.cc:361: @ ... and at least 1 more frames
Fatal Python error: Segmentation fault
Stack (most recent call first):
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/triton/compiler/compiler.py", line 114 in ttgir_to_llir
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/triton/compiler/compiler.py", line 417 in <lambda>
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/triton/compiler/compiler.py", line 509 in compile
File "<string>", line 63 in fused_moe_kernel
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 222 in invok
e_fused_moe_kernel
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/layers/fused_moe/fused_moe.py", line 397 in fused
_moe
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 148 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 302 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 338 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/models/dbrx.py", line 377 in forward
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527 in _call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518 in _wrapped_call_impl
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/worker/model_runner.py", line 683 in execute_model
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/worker/model_runner.py", line 762 in profile_run
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/worker/worker.py", line 131 in profile_num_available_blocks
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 328 in _run_workers
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 224 in _init_cache
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 69 in __init__
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/llm_engine.py", line 119 in __init__
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 421 in _init_engine
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 311 in __init__
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 347 in from_engine_args
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.4.0.post1+rocm603-py3.9-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line 157 in <module>
File "/opt/conda/envs/py_3.9/lib/python3.9/runpy.py", line 87 in _run_code
File "/opt/conda/envs/py_3.9/lib/python3.9/runpy.py", line 197 in _run_module_as_main
Hi @vgod-dbx, please try again with the Dockerfile.rocm here: EDIT: The Dockerfile.rocm from top of tree should now work!
The change made is to install a ROCm fork of Triton. It also contains the numba upgrade we discussed in the other thread.
I've tested a Docker generated using the above Dockerfile on 4x MI250X using the config you specified and it appears to be working fine.
@mawong-amd I can confirm the new container worked! Thanks for the swift response!
@mawong-amd I can confirm the new container worked! Thanks for the swift response!
I failed in MI250x , is it possible for you to share your image?
I failed in MI250x , is it possible for you to share your image?
I failed in MI250x , is it possible for you to share your image?