bitsandbytes
bitsandbytes copied to clipboard
Feature Request: ROCm support (AMD GPU)
Could you please add official AMD ROCm support to this library? An unofficial working port already exists:
https://github.com/broncotc/bitsandbytes-rocm
Thank You
Amazing! Thank you for bringing this to my attention. I will try to get in touch with the author of the ROCm library and support AMD GPUs by default.
Amazing! Thank you for bringing this to my attention. I will try to get in touch with the author of the ROCm library and support AMD GPUs by default.
that would be AMAZING! especially with you recently adding 8 bit support. I tried to make my own merge of the forks but I don't really know what I'm doing and don't think I did it correctly
If the ROCm fork does get merged in, would the Int8 Matmul compatibility improvements also work for AMD GPUs?
@TimDettmers, curious if AMD support any nearer to being merged? @agrocylo made a PR (#296) based somewhat on @broncotc's fork...
EDIT: A slightly newer version branched from v0.37 available here: https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-2
The Wikimedia foundation is really interested in the ROCm support too, since Nvidia is not viable for us due to open-source constraints. @TimDettmers we offer any help (testing/review/etc..) to help merge this feature, it would be really great for the ML open source ecosystem. Thanks in advance!
EDIT: A slightly newer version branched from v0.37 available here: https://github.com/Titaniumtown/bitsandbytes-rocm/tree/patch-2
Hi, I'm also seeking an AMD-GPU-compatible version. I tried your patch-2 version but the code still cannot work. The error info looks like:
File "/home/.local/lib/python3.8/site-packages/bitsandbytes/autograd/__init__.py", line 1, in <module>
from ._functions import undo_layout, get_inverse_transform_indices
File "/home/.local/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 9, in <module>
import bitsandbytes.functional as F
File "/home/.local/lib/python3.8/site-packages/bitsandbytes/functional.py", line 17, in <module>
from .cextension import COMPILED_WITH_CUDA, lib
File "/home/.local/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 74, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
I use AMD MI200 card. Do you have any idea on this? Many thanks.
Hello, I was wondering how far-off the ROCm support is. I'm trying to see if my 7900XTX will be useful in a project of mine. The Llama2 quick start guide makes use of bitsandbytes, and as far as I know there isn't any other alternatives.
Found this rocm version of bitsandbytes: https://github.com/Lzy17/bitsandbytes-rocm/tree/main
The only rocm version that worked for me on GFX900 was this one: https://github.com/agrocylo/bitsandbytes-rocm All the others failed to compile/install (Rocm 5.2)
For anyone that needs a patch for RDNA3
cards I created this fork https://github.com/st1vms/bitsandbytes-rocm-gfx1100
This fork patches the Makefile for targeting gfx1100
amdgpu module along latest ROCM and clang17...and fixes some hip include warnings.
Works with a RX7900XT and ROCM5.7 (along with torch-rocm5.7) installed.
Anyway there should be a better way of targeting the correct amdgpu module in the build system...
Edit:
Probably won't work with libraries requiring version > 0.35
@st1vms There is a problem.
The version of BNB is 0.35.4 which is kind of outdated, and the latest version of Peft requires bitsandbytes>=0.37.0
@st1vms There is a problem. The version of BNB is 0.35.4 which is kind of outdated, and the latest version of Peft requires
bitsandbytes>=0.37.0
If that fork still works for you, maybe it is ok to just change the version number.
You can test if the library works with:
python -m bitsandbytes
If that is the case, try editing the version number in the setup.py
of the fork before building and installing it, i.e. change it to 0.37.0 and see if Peft works...
@st1vms I tried BNB 0.39.0 The dependencies seem fine. However, when I tried to lora finetune according to this https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing
The Jupyter kernel crash, reason: undefined
@st1vms I tried BNB 0.39.0 The dependencies seem fine. However, when I tried to lora finetune according to this https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing
The Jupyter kernel crash, reason: undefined
Well, the fork is probably obsolete already for some libraries, you should look for updated ones.
@st1vms I retried with a new virtual env and change from .ipynb to .py This is the result.
(torch3) win@win-MS-7E02:/mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/Learn$ /usr/bin/env /home/win/torch3/bin/python /home/win/.vscode-oss/extensions/ms-python.python-2023.20.0-universal/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 60843 -- /mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/Learn/finetune.py
/home/win/torch3/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/home/win/torch3/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/home/win/torch3/lib/python3.10/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00, 7.21s/it]
trainable params: 8388608 || all params: 6666862592 || trainable%: 0.12582542214183376
0%| | 0/200 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__
method is faster than using a method to encode the text followed by a call to the pad
method to get a padded encoding.
/home/win/torch3/lib/python3.10/site-packages/torch/utils/checkpoint.py:461: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/win/torch3/lib/python3.10/site-packages/bitsandbytes-0.41.0-py3.10.egg/bitsandbytes/autograd/_functions.py:231: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
============================================= ERROR: Your GPU does not support Int8 Matmul!
python: /mnt/1df6b45e-20dc-41ca-9a04-b271fd3a4940/bitsandbytes-rocm-gfx1100/csrc/ops.cu:347: int igemmlt(cublasLtHandle_t, int, int, int, const int8_t *, const int8_t *, void *, float *, int, int, int) [FORMATB = 3, DTYPE_OUT = 32, SCALE_ROWS = 0]: Assertion `false' failed.
@st1vms I tried BNB 0.39.0 The dependencies seem fine. However, when I tried to lora finetune according to this https://colab.research.google.com/drive/1jCkpikz0J2o20FBQmYmAGdiKmJGOMo-o?usp=sharing The Jupyter kernel crash, reason: undefined
Well, the fork is probably obsolete already for some libraries, you should look for updated ones.
Can someone post to this thread any updated forks? The lack of proper BnB support is really holding back the AMD cards.
Looks like things may finally move forward with official support in the not too distant future! Hope with ROCm 6.x we can finally see support merged into this repo.
Sorry for taking so long on this. I am currently onboarding more maintainers and we should see some progress on this very soon. This is one of our high-priority issues.
Would love to see ROCM support, keep doing your good work
if I may ask, what's the progress so far?
if I may ask, what's the progress so far?
If you haven't already seen it, there was a comment made in the discussions with an accompanying tracking issue for general cross-platform support rather than just AMD/ROCM support. To that end it appears it is currently in the planning phase.
@TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should be seriously considering upstreaming. Could you drop me an email at [email protected], and we can set up a call to discuss further. cc: @sunway513 @Lzy17 @pnunna93
@amathews-amd I tired compiling ROCm version of BnB from the rocm_enabled branch, but it is failing with errors on AMD MI250x. Do you have any suggestions for how to resolve the issue?
@chauhang Could you try with rocm 6.0? You can use this docker - rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2 and install bitsandbytes directly.
@pnunna93 I am already using ROCm 6.0 -- have added details of the pytorch environment here.
@chauhang, you can skip the hipblaslt update and install bitsandbytes directly then. Please let me know if you face any issues.
I was using arlo-phoenix fork. https://github.com/arlo-phoenix/bitsandbytes-rocm-5.6/tree/rocm
Should I use the ROCm fork instead? https://github.com/ROCm/bitsandbytes/tree/rocm_enabled
Yes, its updated for rocm 6
@TimDettmers @Titus-von-Koeller , we are at ~95% parity for bnb for https://github.com/ROCm/bitsandbytes/tree/rocm_enabled on Instinct class gpus, and working to close the gaps on Navi. At this point, we should be seriously considering upstreaming. Could you drop me an email at [email protected], and we can set up a call to discuss further. cc: @sunway513 @Lzy17 @pnunna93
I've often had trouble understanding the state of GPU support in ROCm. So with that said, I have some clarification questions:
- Can we clarify on what we mean by "Instinct-class" GPUs?
- The ROCm 6.0.2 docs suggest to me this is all CDNA, so MI100 and newer? Or is MI50 expected to work also?
- What is the intention for Navi support?
- Is this for RDNA2/RDNA3 only?
- Is there intent to support with ROCm < 6?
I'd like to be able to help get this merged, but need to figure out the constraints. The only AMD GPUs that I have on hand (RX 570 and R9 270X) aren't going to cut it.
The other issue is how far behind main
this is. Ideally this could be implemented as a separate backend as proposed in #898. We would want to change to use CMake for building. I also think that it'd be better to unify the C++/CUDA code with the hipify code and take care of most of the changes with conditional compilation.