[Bug] AttributeError: module 'vllm._custom_ops' has no attribute 'silu_and_mul'
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 5. Please use English, otherwise it will be closed.
Describe the bug
Hell folks,
I'm attempting to deploy DeepSeek-R1 with SGLang on an AMD MI300X, but I'm encountering compatibility issues. Could someone please help me troubleshoot these issues?
Reproduction
- build and install triton 3.0.0 from source
- build and install vllm v0.7.2 from source
- build and install sglang (rev 0a6f18f068e4095fc228e798454e8496c9749214) from source
- run
python3 -m sglang.launch_server --host 0.0.0.0 --port 30000 \ --model-path ~/deepseek/DeepSeek-R1/ \ --tp 8 --trust-remote-code \ --mem-fraction-static 0.70 \ --served-model-name "DeepSeek-R1" \ --log-level debug \ --log-level-http debug \ --log-requests \ --enable-metrics \ --show-time-cost
Then I got this error:
[2025-02-08 05:57:37 TP6] Scheduler hit an exception: Traceback (most recent call last):
File "/home/deepseek/sglang/python/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/managers/scheduler.py", line 240, in __init__
self.tp_worker = TpWorkerClass(
^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__ [604/1865]
self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/managers/tp_worker.py", line 68, in __init__
self.model_runner = ModelRunner(
^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/model_executor/model_runner.py", line 215, in __init__
self.init_cuda_graphs()
File "/home/deepseek/sglang/python/sglang/srt/model_executor/model_runner.py", line 730, in init_cuda_graphs
self.cuda_graph_runner = CudaGraphRunner(self)
^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 232, in __init__
self.capture()
File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 298, in capture
) = self.capture_one_batch_size(bs, forward)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 375, in capture_one_batch_size
run_once()
File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 368, in run_once
logits_output = forward(input_ids, forward_batch.positions, forward_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 858, in forward
hidden_states = self.model(input_ids, positions, forward_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 819, in forward
hidden_states, residual = layer(
^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 774, in forward
hidden_states = self.mlp(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 177, in forward
self.experts(hidden_states=hidden_states, router_logits=router_logits)
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 587, in forward
final_hidden_states = self.quant_method.apply(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/layers/quantization/fp8.py", line 820, in apply
return fused_experts(
^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 835, in fused_experts
torch.ops.sglang.inplace_fused_experts(
File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/_ops.py", line 1123, in __call__
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 715, in inplace_fused_experts
fused_experts_impl(
File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 992, in fused_experts_impl
ops.silu_and_mul(intermediate_cache2, intermediate_cache1.view(-1, N))
^^^^^^^^^^^^^^^^
AttributeError: module 'vllm._custom_ops' has no attribute '**silu_and_mul**'
Environment
Python: 3.12.8 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:31:09) [GCC 11.2.0] ROCM available: True GPU 0,1,2,3,4,5,6,7: AMD Instinct MI300X VF GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.4 ROCM_HOME: /opt/rocm-6.1.0 HIPCC: HIP version: 6.1.40091-a8dbc0c19 ROCM Driver Version: 6.8.1 PyTorch: 2.6.0+rocm6.1 sglang: 0.4.2.post3 flashinfer: 0.2.0.post2 triton: 3.0.0 transformers: 4.48.3 torchao: 0.8.0 numpy: 1.26.4 aiohttp: 3.11.12 fastapi: 0.115.8 hf_transfer: 0.1.9 huggingface_hub: 0.28.1 interegular: 0.3.3 modelscope: 1.22.3 orjson: 3.10.15 packaging: 24.2 psutil: 6.1.1 pydantic: 2.10.6 multipart: 0.0.20 zmq: 26.2.1 uvicorn: 0.34.0 uvloop: 0.21.0 vllm: 0.7.2 openai: 1.61.1 anthropic: 0.45.2 decord: 0.6.0 AMD Topology:
============================ ROCm System Management Interface ============================ =============================== Link Type between two GPUs =============================== GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU0 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI GPU1 XGMI 0 XGMI XGMI XGMI XGMI XGMI XGMI GPU2 XGMI XGMI 0 XGMI XGMI XGMI XGMI XGMI GPU3 XGMI XGMI XGMI 0 XGMI XGMI XGMI XGMI GPU4 XGMI XGMI XGMI XGMI 0 XGMI XGMI XGMI GPU5 XGMI XGMI XGMI XGMI XGMI 0 XGMI XGMI GPU6 XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI GPU7 XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0 ================================== End of ROCm SMI Log ===================================
Thank you for reporting this error. We will try to reproduce to troubleshoot this issue.
@daizw You should use the AMD Docker instead of building from source. Currently, in the latest SGLang version, AMD uses the native implementation. cc @HaiShaw
@daizw , could you please try to use this docker first (https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html).
If you keep encountering this issue, please kindly let me know the error and the new packages you installed and built. I will use AMD MI300 to figure out those bugs.
You should be able to replace that ops.silu_and_mul missing with:
torch.ops._C.silu_and_mul(intermediate_cache2,
intermediate_cache1.view(-1, N))
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.