bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

[AMD GPU installation] The Rocm-bitsandbytes installation issues

Open haic0 opened this issue 8 months ago • 13 comments

System Info

Issues: The installation of the latest multi-backend-refactor branch failed in the AMD GPU. While switching to the Rocm-bitsandbytes repo, by using the rocm_enabled_multi_backend branch, the installation was successfully. Could you please check if the right branch was selected, thanks so much!

Official Repo: https://github.com/bitsandbytes-foundation/bitsandbytes/tree/multi-backend-refactor

Test Environment: AMD MI300X GPU

Docker image:

Rocm6.4 is the latest version for the AMD Rocm release.

docker pull rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1

Reproduction

How to reproduce: Step by Step:

Rocm6.4 is the latest version for the AMD Rocm release.

docker pull rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1

docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri -v /:/workspace --group-add video --ipc=host --name bitsandbytes01 rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1

Inside the docker: git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/

Compile & install

apt-get install -y build-essential cmake
cmake -DCOMPUTE_BACKEND=hip -S .
make pip install -e .

After installation done:


Successfully built bitsandbytes Installing collected packages: bitsandbytes Attempting uninstall: bitsandbytes Found existing installation: bitsandbytes 1.0.0 Uninstalling bitsandbytes-1.0.0: Successfully uninstalled bitsandbytes-1.0.0 Successfully installed bitsandbytes-1.0.0


Verify the installation once done. python -m bitsandbytes


root@93db47d5b637:/var/lib/jenkins/bitsandbytes# python -m bitsandbytes Could not load bitsandbytes native library: /var/lib/jenkins/bitsandbytes/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi. If you use Intel CPU or XPU, please pip install intel_extension_for_pytorch Traceback (most recent call last): File "/var/lib/jenkins/bitsandbytes/bitsandbytes/cextension.py", line 115, in lib = get_native_library() ^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/bitsandbytes/bitsandbytes/cextension.py", line 86, in get_native_library dll = ct.cdll.LoadLibrary(str(binary_path)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/ctypes/init.py", line 460, in LoadLibrary return self._dlltype(name) ^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/ctypes/init.py", line 379, in init self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /var/lib/jenkins/bitsandbytes/bitsandbytes/libbitsandbytes_rocm64.so: undefined symbol: _Z36__device_stub__kOptimizer32bit1StateI12hip_bfloat16Li2EEvPT_S2_PfS3_ffffffiffbi

ROCm Setup failed despite ROCm being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate ROCm libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++ ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4) PyTorch settings found: ROCM_VERSION=64 The directory listed in your path is found to be non-existent: /opt/ompi/lib The directory listed in your path is found to be non-existent: /opt/ompi The directory listed in your path is found to be non-existent: /opt/ucx ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Checking that the library is importable and ROCm is callable... Couldn't load the bitsandbytes library, likely due to missing binaries. Please ensure bitsandbytes is properly installed.

For source installations, compile the binaries with cmake -DCOMPUTE_BACKEND=hip -S .. See the documentation for more details if needed.

Trying a simple check anyway, but this will likely fail... Traceback (most recent call last): File "/var/lib/jenkins/bitsandbytes/bitsandbytes/diagnostics/main.py", line 73, in main sanity_check() File "/var/lib/jenkins/bitsandbytes/bitsandbytes/diagnostics/main.py", line 42, in sanity_check adam.step() File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/optim/optimizer.py", line 484, in wrapper out = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/bitsandbytes/bitsandbytes/optim/optimizer.py", line 292, in step self.update_step(group, p, gindex, pindex) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/bitsandbytes/bitsandbytes/optim/optimizer.py", line 522, in update_step F.optimizer_update_32bit( File "/var/lib/jenkins/bitsandbytes/bitsandbytes/functional.py", line 1266, in optimizer_update_32bit return backends[g.device.type].optimizer_update_32bit( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/bitsandbytes/bitsandbytes/backends/cuda.py", line 780, in optimizer_update_32bit optim_func = str2optimizer32bit[optimizer_name][0] ^^^^^^^^^^^^^^^^^^ NameError: name 'str2optimizer32bit' is not defined Above we output some debug information. Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose WARNING: Please be sure to sanitize sensitive info from the output before posting it.


Debugging details:

If we use the latest /ROCm/bitsandbytes to install, the installation was successful.

docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri -v /:/workspace --group-add video --ipc=host --name bitsandbytes01 rocm/pytorch:rocm6.4_ubuntu24.04_py3.12_pytorch_release_2.4.1

Inside the docker:

git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git

cd bitsandbytes git checkout rocm_enabled_multi_backend pip install -r requirements-dev.txt cmake -DCOMPUTE_BACKEND=hip -S . #Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch make pip install .

Verify the installation once done,

python -m bitsandbytes


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++ ROCm specs: rocm_version_string='64', rocm_version_tuple=(6, 4) PyTorch settings found: ROCM_VERSION=64 The directory listed in your path is found to be non-existent: /opt/ompi/lib The directory listed in your path is found to be non-existent: /opt/ompi The directory listed in your path is found to be non-existent: /opt/ucx ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Checking that the library is importable and ROCm is callable... SUCCESS! Installation was successful!


Expected behavior

Could you please check the branch difference, so that the multi-backend-refactor could be installed successfully in the latest Rocm6.4 environment, thanks so much !

haic0 avatar Apr 25 '25 09:04 haic0