unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Compatibility with Blackwell

Open miweru opened this issue 9 months ago • 8 comments

Hi, when using your script for installing unsloth

conda create --name unsloth_env
python=3.11
pytorch-cuda=12.1
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers
-y conda activate unsloth_env

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" pip install --no-deps trl peft accelerate bitsandbytes

there is an error when loading unsloth:

NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.

After installing pytorch with the new cuda:

pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

then i had to compile bitsandbytes from source - and triton (which failed) and xformers - because their default versions do not work with blackwell (or at least cuda 12.8) yet

I now try installing unsloth into the nvcr.io/nvidia/pytorch:25.01-py3 docker image...

miweru avatar Feb 12 '25 17:02 miweru

Oh no, unfortunately because we rely on Triton, Pytorch etc, if they don't work for the new GPUs then we won't either. Hopefully they support the new 50 series soon

shimmyshimmer avatar Feb 12 '25 19:02 shimmyshimmer

Oh no, unfortunately because we rely on Triton, Pytorch etc, if they don't work for the new GPUs then we won't either. Hopefully they support the new 50 series soon

Seems they have enabled Blackwell supporting: Nightly Pytorch and Triton main branch. Need to use CUDA 12.8

ttio2tech avatar Feb 14 '25 11:02 ttio2tech

I have a very similar problem on an H200 server. CUDA 12.8 is installed and I use a nightly build to to install torch (with visible GPUSs). I cannot pip install bitsandbytes and I must build from source but that succeeds and loads fine.

However, if I pip install unsloth it appears to downgrade CUDA to 12.4.

when do "from unsloth import FastLanguageModel" I get:

File ~/debug/lib/python3.12/site-packages/unsloth/kernels/utils.py:65      
63 import bitsandbytes as bnb      
64 # https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1330/files ---> 
65 HAS_CUDA_STREAM = Version(bnb.__version__) > Version("0.43.3")      
66 global CUDA_STREAM      
67 CUDA_STREAM = None AttributeError: module 'bitsandbytes' has no attribute '__version__'

simusid avatar Feb 23 '25 19:02 simusid

We released bitsandbytes 0.45.3 wheels with CUDA 12.8 builds for Blackwell yesterday.

matthewdouglas avatar Feb 25 '25 19:02 matthewdouglas

Hurray! I can confirm 100% success on a DGX-H200 today with these updates (plus possibly also bitsandbytes updates too)! Thanks everyone!

simusid avatar Feb 26 '25 19:02 simusid

Hi @simusid, could you please guide me through how you managed to do it?

I do own an RTX 5090 and tried installing the following deps:

- Triton: pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly
- PyTorch: pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
- Bitsandbytes: pip install bitsandbytes
- Unsloth: tried both pip install unsloth and pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Even though it looks everything works fine, when running the training cell it breaks the kernel.

oteroantoniogom avatar Feb 28 '25 09:02 oteroantoniogom

@oteroantoniogom,

The only difference I see is that I installed torch with https://download.pytorch.org/whl/nightly/cu128 because that matches the driver installed by the sysadmin.

simusid avatar Mar 01 '25 12:03 simusid

@simusid Yeah my bad, I copy pasted the wrong one, even though I had installed that one. It still says 'sm_120 is not a recognized processor for this target' tho. If I am not wrong is a problem with Triton, but I cannot build it from source neither.

oteroantoniogom avatar Mar 03 '25 12:03 oteroantoniogom

Triton now support Blackwell.

itay-grudev avatar Mar 30 '25 03:03 itay-grudev

I compiled triton, pytorch, vllm and install unsloth and all unsloth scripts are working on 5070ti with os.environ["VLLM_FLASH_ATTN_VERSION"] = "2" os.environ["VLLM_USE_V1"] = "0" I tried every day with git pull and first pytorch was working, next triton after several git pulls and compilation, finally vllm is working with unsloth scripts.

TRITON

git clone https://github.com/triton-lang/triton.git
cd triton
pip install -r python/requirements.txt # build-time dependencies
cd python
MAX_JOBS=2 python setup.py bdist_wheel
pip install dist/xxx

PYTORCH

git clone https://github.com/pytorch/pytorch
cd pytorch
export CFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"
export CXXFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"
git submodule sync
git submodule update --init --recursive -j 8
pip install -r requirements.txt
pip install mkl-static mkl-include wheel
# Build PyTorch (will take a long time)
export CUDA_HOME=/opt/cuda
export CUDA_PATH=$CUDA_HOME
export TORCH_CUDA_ARCH_LIST=Blackwell
MAX_JOBS=2 python setup.py bdist_wheel
pip install dist/xxx

VLLM

git clone https://github.com/vllm-project/vllm.git
cd vllm
export CUDA_HOME=/opt/cuda
export CUDA_PATH=$CUDA_HOME
export TORCH_CUDA_ARCH_LIST=Blackwell
# trying to set other than only cuda libs
export USE_CUDNN=1
export USE_CUSPARSELT=1
export USE_CUFILE=1
export USE_CUDSS=0
export CMAKE_ARGS="-DUSE_CUDNN=1 -DUSE_CUSPARSELT=1 -DUSE_CUDSS=0 -DUSE_CUFILE=1"
# Build vllm (will take a long time)
export CUDA_HOME=/opt/cuda
python use_existing_torch.py
pip install -r requirements/build.txt
pip install setuptools_scm
MAX_JOBS=1 python setup.py bdist_wheel
pip install dist/xxx
  • pip install bitsandbytes
  • pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

pwasiewi avatar Apr 03 '25 18:04 pwasiewi

@g0t4 @hailangge @hongbo-miao @itay-grudev @matthewdouglas @miweru @nernjn @oteroantoniogom @pwasiewi @qingy1337 @simusid @ttio2tech

Hello,

We just added a blackwell support installation guide here: https://github.com/unslothai/unsloth/tree/main/blackwell

In case you're looking for the conda compatible version, we have an extended version with conda/mamba instructions here but it's still a draft: https://github.com/unslothai/unsloth/blob/edd7d28114299f6c34bc4dbec9806cd41d9b9dfe/blackwell/README.md

Would appreciate your feedback after you checkout and follow on the prescribed installation instructions. thank you

rolandtannous avatar Jun 29 '25 22:06 rolandtannous

Thanks @rolandtannous
It works using https://github.com/unslothai/unsloth/tree/main/blackwell The qwen3 test script gives an error: "'Qwen3ForCausalLM' object has no attribute 'disable_gradient_checkpointing'". But it's for the inference part. After commenting out the "model.disable_gradient_checkpointing()" (line 418), it works without issue.

ttio2tech avatar Jun 30 '25 11:06 ttio2tech

I followed the instructions but the xformers setup gave a ton of errors, the gist of it seems to be: nvcc fatal : unsupported architecture 'compute_120'...

glandeurlessard avatar Jun 30 '25 19:06 glandeurlessard

I followed the instructions but the xformers setup gave a ton of errors, the gist of it seems to be: nvcc fatal : unsupported architecture 'compute_120'..s.

Blackwell requires cuda 12.8

ttio2tech avatar Jul 04 '25 10:07 ttio2tech