sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Bug] AttributeError: module 'vllm._custom_ops' has no attribute 'silu_and_mul'

Open daizw opened this issue 10 months ago • 3 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [x] 2. The bug has not been fixed in the latest version.
  • [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • [x] 5. Please use English, otherwise it will be closed.

Describe the bug

Hell folks,

I'm attempting to deploy DeepSeek-R1 with SGLang on an AMD MI300X, but I'm encountering compatibility issues. Could someone please help me troubleshoot these issues?

Reproduction

  1. build and install triton 3.0.0 from source
  2. build and install vllm v0.7.2 from source
  3. build and install sglang (rev 0a6f18f068e4095fc228e798454e8496c9749214) from source
  4. run python3 -m sglang.launch_server --host 0.0.0.0 --port 30000 \ --model-path ~/deepseek/DeepSeek-R1/ \ --tp 8 --trust-remote-code \ --mem-fraction-static 0.70 \ --served-model-name "DeepSeek-R1" \ --log-level debug \ --log-level-http debug \ --log-requests \ --enable-metrics \ --show-time-cost

Then I got this error:

[2025-02-08 05:57:37 TP6] Scheduler hit an exception: Traceback (most recent call last):
  File "/home/deepseek/sglang/python/sglang/srt/managers/scheduler.py", line 1787, in run_scheduler_process
    scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/managers/scheduler.py", line 240, in __init__
    self.tp_worker = TpWorkerClass(
                     ^^^^^^^^^^^^^^
File "/home/deepseek/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in __init__                                                                                                       [604/1865]
    self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/managers/tp_worker.py", line 68, in __init__
    self.model_runner = ModelRunner(
                        ^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/model_executor/model_runner.py", line 215, in __init__
    self.init_cuda_graphs()
  File "/home/deepseek/sglang/python/sglang/srt/model_executor/model_runner.py", line 730, in init_cuda_graphs
    self.cuda_graph_runner = CudaGraphRunner(self)
                             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 232, in __init__
    self.capture()
  File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 298, in capture
    ) = self.capture_one_batch_size(bs, forward)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 375, in capture_one_batch_size
    run_once()
  File "/home/deepseek/sglang/python/sglang/srt/model_executor/cuda_graph_runner.py", line 368, in run_once
    logits_output = forward(input_ids, forward_batch.positions, forward_batch)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 858, in forward
    hidden_states = self.model(input_ids, positions, forward_batch)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)                                                                                                                                                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 819, in forward
    hidden_states, residual = layer(
                              ^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 774, in forward
    hidden_states = self.mlp(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/models/deepseek_v2.py", line 177, in forward
    self.experts(hidden_states=hidden_states, router_logits=router_logits)
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 587, in forward
    final_hidden_states = self.quant_method.apply(
                          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/layers/quantization/fp8.py", line 820, in apply
    return fused_experts(
           ^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 835, in fused_experts
    torch.ops.sglang.inplace_fused_experts(
  File "/home/.conda/envs/py312B/lib/python3.12/site-packages/torch/_ops.py", line 1123, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 715, in inplace_fused_experts
    fused_experts_impl(
  File "/home/deepseek/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 992, in fused_experts_impl
    ops.silu_and_mul(intermediate_cache2, intermediate_cache1.view(-1, N))
    ^^^^^^^^^^^^^^^^
AttributeError: module 'vllm._custom_ops' has no attribute '**silu_and_mul**'

Environment

Python: 3.12.8 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:31:09) [GCC 11.2.0] ROCM available: True GPU 0,1,2,3,4,5,6,7: AMD Instinct MI300X VF GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.4 ROCM_HOME: /opt/rocm-6.1.0 HIPCC: HIP version: 6.1.40091-a8dbc0c19 ROCM Driver Version: 6.8.1 PyTorch: 2.6.0+rocm6.1 sglang: 0.4.2.post3 flashinfer: 0.2.0.post2 triton: 3.0.0 transformers: 4.48.3 torchao: 0.8.0 numpy: 1.26.4 aiohttp: 3.11.12 fastapi: 0.115.8 hf_transfer: 0.1.9 huggingface_hub: 0.28.1 interegular: 0.3.3 modelscope: 1.22.3 orjson: 3.10.15 packaging: 24.2 psutil: 6.1.1 pydantic: 2.10.6 multipart: 0.0.20 zmq: 26.2.1 uvicorn: 0.34.0 uvloop: 0.21.0 vllm: 0.7.2 openai: 1.61.1 anthropic: 0.45.2 decord: 0.6.0 AMD Topology:

============================ ROCm System Management Interface ============================ =============================== Link Type between two GPUs =============================== GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU0 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI GPU1 XGMI 0 XGMI XGMI XGMI XGMI XGMI XGMI GPU2 XGMI XGMI 0 XGMI XGMI XGMI XGMI XGMI GPU3 XGMI XGMI XGMI 0 XGMI XGMI XGMI XGMI GPU4 XGMI XGMI XGMI XGMI 0 XGMI XGMI XGMI GPU5 XGMI XGMI XGMI XGMI XGMI 0 XGMI XGMI GPU6 XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI GPU7 XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0 ================================== End of ROCm SMI Log ===================================

daizw avatar Feb 08 '25 06:02 daizw

Thank you for reporting this error. We will try to reproduce to troubleshoot this issue.

jhinpan avatar Feb 08 '25 07:02 jhinpan

@daizw You should use the AMD Docker instead of building from source. Currently, in the latest SGLang version, AMD uses the native implementation. cc @HaiShaw

zhyncs avatar Feb 08 '25 08:02 zhyncs

@daizw , could you please try to use this docker first (https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html).

If you keep encountering this issue, please kindly let me know the error and the new packages you installed and built. I will use AMD MI300 to figure out those bugs.

yushengsu-thu avatar Feb 10 '25 04:02 yushengsu-thu

You should be able to replace that ops.silu_and_mul missing with:

 torch.ops._C.silu_and_mul(intermediate_cache2,
                                  intermediate_cache1.view(-1, N))

etiennemlb avatar Mar 03 '25 17:03 etiennemlb

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

github-actions[bot] avatar May 03 '25 00:05 github-actions[bot]