mamba
mamba copied to clipboard
Mamba Installation Failed; PyTorch+ROCm version 6.0 & 6.1 not working
Mamba Installation Failed; PyTorch+ROCm version 6.0 & 6.1 not working
I tried to install mamba with two containers on Ubuntu 22.04 LTS, one with ROCm 6.0.2 & PyTorch+rocm6.0 installed, another with ROCm 6.1.2 & PyTorch+rocm6.1 installed.
Notes on my ROCm 6.1.2 setup: https://github.com/eliranwong/incus_container_gui_setup/blob/main/ubuntu_22.04_LTS_latest_rocm_tested.md
Notes on my ROCM 6.0.2 setup: https://github.com/eliranwong/incus_container_gui_setup/blob/main/ubuntu_22.04_LTS_tested.md
I already applied the path https://github.com/state-spaces/mamba/blob/main/rocm_patch/rocm6_0.patch in container running 6.0.2.
When I run pip install mamba-ssm, I encountered errors:
With PyTorch + rocm 6.0
with open(fin_path, encoding='utf-8') as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-iy9acbtt/mamba-ssm_aae2c1df8bb54f62a59b41fd74fafbe0/csrc/selective_scan/selective_scan.cpp'
torch.__version__ = 2.3.1+rocm6.0
With PyTorch + rocm 6.1
with open(fin_path, encoding='utf-8') as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-onfy5yn9/mamba-ssm_70828647adec4a73aa94f62ff7a0c1d1/csrc/selective_scan/selective_scan.cpp'
torch.__version__ = 2.5.0.dev20240618+rocm6.1
I just tried to install directly on the host, but no luck, same errors:
...
File "/home/eliran/apps/mamba/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py", line 826, in preprocessor
with open(fin_path, encoding='utf-8') as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-2d_pvv7e/mamba-ssm_19b07ba0f4b54a60b6feb761a9d6d942/csrc/selective_scan/selective_scan.cpp'
torch.__version__ = 2.3.1+rocm6.0
Same problem . on docker host with rocm 6.0 with your patch or 6.1
pip install mamba-ssm:
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-y3jhawqw/mamba-ssm_1617e4dcea5044fabfc486e5325fed98/setup.py", line 239, in <module>
CUDAExtension(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1098, in CUDAExtension
hipify_result = hipify_python.hipify(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/hipify/hipify_python.py", line 1150, in hipify
preprocess_file_and_save_result(output_directory, filepath, all_files, header_include_dirs,
File "/usr/local/lib/python3.10/dist-packages/torch/utils/hipify/hipify_python.py", line 206, in preprocess_file_and_save_result
result = preprocessor(output_directory, filepath, all_files, header_include_dirs, stats,
File "/usr/local/lib/python3.10/dist-packages/torch/utils/hipify/hipify_python.py", line 826, in preprocessor
with open(fin_path, encoding='utf-8') as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-install-y3jhawqw/mamba-ssm_1617e4dcea5044fabfc486e5325fed98/csrc/selective_scan/selective_scan.cpp'
torch.__version__ = 2.5.0.dev20240620+rocm6.1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
With a compilation i've error
We don't have support for direct pip installation yet. Can you try building from source:
git clone https://github.com/state-spaces/mamba.git
cd mamba
pip install .
Let me know if that works.
Thanks!
Instead of checking out, you can also run: pip install git+https://github.com/state-spaces/mamba.git
To check out, build, and install in one step
pip install git+https://github.com/state-spaces/mamba.git
Tried, but unsuccessful, errors:
In file included from /opt/rocm-6.0.2/include/hipcub/backend/rocprim/hipcub.hpp:40:
/opt/rocm-6.0.2/include/hipcub/backend/rocprim/block/block_load.hpp:134:20: error: no member named 'load' in 'rocprim::block_load<unsigned long, 32, 1, rocprim::block_load_method::block_load_warp_transpose>'
base_type::load(block_iter, items, valid_items, temp_storage_);
^
/home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_common_hip.h:187:56: note: in instantiation of function template specialization 'hipcub::BlockLoad<unsigned long, 32, 1, hipcub::BLOCK_LOAD_WARP_TRANSPOSE>::Load<unsigned long *>' requested here
typename Ktraits::BlockLoadVecT(smem_load_vec).Load(
^
/home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_bwd_kernel_hip.cuh:159:9: note: in instantiation of function template specialization 'load_input<Selective_Scan_bwd_kernel_traits<32, 4, true, true, true, true, true, c10::BFloat16, c10::complex<float>>>' requested here
load_input<Ktraits>(u, u_vals, smem_load, params.seqlen - chunk * kChunkSize);
^
/home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_bwd_kernel_hip.cuh:513:40: note: in instantiation of function template specialization 'selective_scan_bwd_kernel<Selective_Scan_bwd_kernel_traits<32, 4, true, true, true, true, true, c10::BFloat16, c10::complex<float>>>' requested here
auto kernel = &selective_scan_bwd_kernel<Ktraits>;
^
/home/ubuntu/mamba/mamba/csrc/selective_scan/selective_scan_bwd_kernel_hip.cuh:548:13: note: in instantiation of function template specialization 'selective_scan_bwd_launch<32, 4, c10::BFloat16, c10::complex<float>>' requested here
selective_scan_bwd_launch<32, 4, input_t, weight_t>(params, stream);
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
1 warning and 20 errors generated when compiling for host.
error: command '/opt/rocm-6.0.2/bin/hipcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for mamba-ssm
Running setup.py clean for mamba-ssm
Failed to build mamba-ssm
Installing collected packages: ninja, urllib3, triton, tqdm, safetensors, regex, pyyaml, idna, einops, charset-normalizer, certifi, requests, huggingface-hub, tokenizers, transformers, mamba-ssm
Running setup.py install for mamba-ssm ... \
I ran this successfully on the rocm/pytorch:latest docker image. Can you try?
@eliranwong We have reproduced this issue and are working to fix it. Thanks for reporting it!
@eliranwong There is some bug related to the warp size of Radeon in one of the rocm libraries. We are working to fix that. For now, we have a temporary fix in which we compile the same kernel launch parameters for both Instinct and Radeon. The performance hit is negligible. Here's the branch for the fix on our repo: https://github.com/rocm-port/mamba-rocm/tree/radeon_tempfix
Your work and updates are much appreciated. Thanks a lot.
I'm still having this error even with the new PR.
Hi @eliranwong and @george-adams1 ,
I tried installing the same with rocm/pytorch-training docker image and it ran successfully. Follow the steps and let me know if you face any issues.
git clone https://github.com/state-spaces/mamba.git mamba_ssm
cd mamba_ssm
git checkout 014c094
export HIP_ARCHITECTURES="gfx942" # Since MI300
pip install --no-cache-dir --verbose .
Disclaimer 1: Here HIP_ARCHITECTURES=<Your-AMD-GPU-Arch> is explicitly passed to avoid setting --offload-arch=native instead of your actual AMD GPU architecture name (as done in setup.py of mamba).
Disclaimer 2: git checkout 014c094 to avoid issue like, NameError: name 'bare_metal_version' is not defined. Refer: https://github.com/jxiw/MambaInLlama/tree/main?tab=readme-ov-file#environment.
Thanks!