mamba Mamba fails to install on AMD GPU ROCm

I'm having issues trying to install Mamba on AMD GPU.

GPU is 7900XTX, Python version 3.10, ROCm version 6.2.3, Pytorch version 2.3.0. numpy is installed with version 1.26.4

Below are the error encountered:

Defaulting to user installation because normal site-packages is not writeable Collecting mamba-ssm[causal-conv1d] Using cached mamba_ssm-2.2.4.tar.gz (91 kB) Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [24 lines of output] /tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) cpu = _conversion_method_template(device=torch.device("cpu")) No ROCm runtime is found, using ROCM_HOME='/opt/rocm-6.2.3' :118: UserWarning: mamba_ssm was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.

  torch.__version__  = 2.5.1+cu124


  Traceback (most recent call last):
    File "/home/kaizerbox/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/home/kaizerbox/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/kaizerbox/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=[])
    File "/tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 304, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 320, in run_setup
      exec(code, locals())
    File "<string>", line 188, in <module>
  NameError: name 'bare_metal_version' is not defined
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output.

Jan 15 '25 04:01 KaizerBox

you need this https://rocm.docs.amd.com/en/docs-6.0.0/about/release-notes.html

and numpy

Feb 14 '25 08:02 japrogramer

I found out the problem,

The Mamba pyproject.toml file tells SetupTools to install the CUDA version of Torch, and when we use pip install, pip creates a temporary virtual environment of it's own and then gets that CUDA version of Torch, which then leads to the HIP_BUILD check within mamba's setup.py file to fail as, as well as Torch's own HIP check to fail as torch.version.hip = None; That's why it says no ROCm runtime found even though it clearly found the ROCm folder.

The way I was able to build it successfully using ROCm was by creating a python virtual environment and install the ROCm version of Torch, and also setuptools, wheel, packaging, ninja, einops, transformers, triton and then adding --no-build-isolation to my build command.

I did also apply the official AMD ROCm patch #405 which just declares the WARP_SIZE in a few different areas

The following should get mamba built correctly:

mkdir project1
cd project1
python3 -m venv venv
source venv/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4
pip3 install ninja einops transformers packaging setuptools wheel triton

git clone https://github.com/state-spaces/mamba
git clone https://github.com/Dao-AILab/causal-conv1d
cd causal-conv1d
pip3 install .
cd ../mamba
git checkout 0cce0fa645f100f00620ddf2333c2b7712abfdec
git am ROCm-warp-size-fix-AMD-official-405.patch
pip3 install . --no-build-isolation

And everything should build successfully

Save this as ROCm-warp-size-fix-AMD-official-405.patch:

From 883ea0a28c60031fefd319303161f09f5b5dc8e6 Mon Sep 17 00:00:00 2001
From: amoskvic <[email protected]>
Date: Tue, 18 Jun 2024 21:25:51 -0600
Subject: [PATCH] ROCm warp size fix [AMD official] (#405)

* ROCM conditional compilation fix

* compile flag for warp size

* use #define to set warp size

* fix brace bug

* minor style fix

---------

Co-authored-by: root <[email protected]>
---
 .../selective_scan_bwd_kernel.cuh              |  6 ++++++
 .../selective_scan_fwd_kernel.cuh              |  6 ++++++
 setup.py                                       | 18 ++++++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/csrc/selective_scan/selective_scan_bwd_kernel.cuh b/csrc/selective_scan/selective_scan_bwd_kernel.cuh
index c720ba2..737420f 100755
--- a/csrc/selective_scan/selective_scan_bwd_kernel.cuh
+++ b/csrc/selective_scan/selective_scan_bwd_kernel.cuh
@@ -536,6 +536,12 @@ template<typename input_t, typename weight_t>
 void selective_scan_bwd_cuda(SSMParamsBwd &params, cudaStream_t stream) {
 
     #ifndef USE_ROCM
+        #define warp_size 32
+    #else
+        #define warp_size ROCM_WARP_SIZE
+    #endif
+
+    #if warp_size == 32 
         if (params.seqlen <= 128) {
             selective_scan_bwd_launch<32, 4, input_t, weight_t>(params, stream);
         } else if (params.seqlen <= 256) {
diff --git a/csrc/selective_scan/selective_scan_fwd_kernel.cuh b/csrc/selective_scan/selective_scan_fwd_kernel.cuh
index 80e9e37..e15ab81 100755
--- a/csrc/selective_scan/selective_scan_fwd_kernel.cuh
+++ b/csrc/selective_scan/selective_scan_fwd_kernel.cuh
@@ -351,6 +351,12 @@ template<typename input_t, typename weight_t>
 void selective_scan_fwd_cuda(SSMParamsBase &params, cudaStream_t stream) {
 
     #ifndef USE_ROCM
+        #define warp_size 32
+    #else
+        #define warp_size ROCM_WARP_SIZE
+    #endif
+
+    #if warp_size == 32
         if (params.seqlen <= 128) {           
             selective_scan_fwd_launch<32, 4, input_t, weight_t>(params, stream);
         } else if (params.seqlen <= 256) {
diff --git a/setup.py b/setup.py
index 7c6196d..4840e63 100755
--- a/setup.py
+++ b/setup.py
@@ -198,6 +198,23 @@ if not SKIP_CUDA_BUILD:
 
     if HIP_BUILD:
 
+        try:
+            # set warp size based on gcn architecure 
+            gcn_arch_name = torch.cuda.get_device_properties(0).gcnArchName
+            if "gfx10" in gcn_arch_name or "gfx11" in gcn_arch_name:
+                # radeon
+                warp_size = 32
+            else:
+                # instinct
+                warp_size = 64
+        except AttributeError as e:
+            # fall back to crude method to set warp size
+            device_name = torch.cuda.get_device_properties(0).name
+            if 'instinct' in device_name.lower():
+                warp_size = 64
+            else:
+                warp_size = 32
+
         extra_compile_args = {
             "cxx": ["-O3", "-std=c++17"],
             "nvcc": [
@@ -207,6 +224,7 @@ if not SKIP_CUDA_BUILD:
                 "-U__CUDA_NO_HALF_OPERATORS__",
                 "-U__CUDA_NO_HALF_CONVERSIONS__",
                 "-fgpu-flush-denormals-to-zero",
+                f"-DROCM_WARP_SIZE={warp_size}"
             ]
             + cc_flag,
         }
-- 
2.48.1

Feb 24 '25 20:02 YellowRoseCx

@YellowRoseCx Thanks 🙌, I just ordered an AMD GPU. Will test soon.

Feb 24 '25 20:02 japrogramer

@YellowRoseCx

Think i followed your guide still getting this.

` RuntimeError: causal_conv1d is only supported on ROCm 6.0 and above. Note: make sure HIP has a supported version by running hipcc --version.

  torch.__version__  = 2.6.0+rocm6.2.4
  
  
  [end of output]

running the causal pip install as well as the pip install mamba setup? `

what version of Python you using. I setup for Python3.10.10 Also,,,are you on Ubuntu or Debian. Debian search paths for hipcc is different than Ubuntu's /opt/rocm. Debian is /usr... this may be why mine is still failing. Ive tried all kinds of exports,,and stil memaba does not find Torch or hipcc.

Im lost.....

Feb 25 '25 00:02 brcisna

Installing it under the root of the repo by python setup.py install works when using rocm/pytorch-training:v25.3 image. It might help in your case.

Mar 10 '25 19:03 blakechi

@blakechi

Have no idea what you you mean rocm/pytorch-training:v25.3 image I am using native Debian 13 Trixie as base OS, Can you expound on this setup you are talking about? TIA

Mar 11 '25 12:03 brcisna

That is the image I tried and here is the link. You can try install mamba inside the container with python setup.py install.

Mar 11 '25 15:03 blakechi

I'm having issues trying to install Mamba on AMD GPU.

GPU is 7900XTX, Python version 3.10, ROCm version 6.2.3, Pytorch version 2.3.0. numpy is installed with version 1.26.4

Below are the error encountered:

Defaulting to user installation because normal site-packages is not writeable Collecting mamba-ssm[causal-conv1d] Using cached mamba_ssm-2.2.4.tar.gz (91 kB) Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [24 lines of output] /tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) cpu = _conversion_method_template(device=torch.device("cpu")) No ROCm runtime is found, using ROCM_HOME='/opt/rocm-6.2.3' :118: UserWarning: mamba_ssm was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
  torch.__version__  = 2.5.1+cu124


  Traceback (most recent call last):
    File "/home/kaizerbox/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/home/kaizerbox/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/kaizerbox/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=[])
    File "/tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 304, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-u3wq8vyd/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 320, in run_setup
      exec(code, locals())
    File "<string>", line 188, in <module>
  NameError: name 'bare_metal_version' is not defined
  [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output.

Hi @KaizerBox

I am similarly stumbled upon the following issue and got to make it work on AMD MI300s with following trick.

git clone https://github.com/state-spaces/mamba.git mamba_ssm
cd mamba_ssm
git checkout 014c094
export HIP_ARCHITECTURES="gfx942" # Since MI300
pip install --no-cache-dir --verbose .

Disclaimer 1: Here HIP_ARCHITECTURES=<Your-AMD-GPU-Arch> is explicitly passed to avoid setting --offload-arch=native instead of your actual AMD GPU architecture name (as done in setup.py of mamba).

Disclaimer 2: git checkout 014c094 to avoid issue you faced e.g., NameError: name 'bare_metal_version' is not defined. Refer: https://github.com/jxiw/MambaInLlama/tree/main?tab=readme-ov-file#environment.

I tried it with rocm/pytorch-training docker image.

Apr 03 '25 14:04 rodosingh