unsloth
unsloth copied to clipboard
Compatibility with Blackwell
Hi, when using your script for installing unsloth
conda create --name unsloth_env
python=3.11
pytorch-cuda=12.1
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers
-y conda activate unsloth_envpip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" pip install --no-deps trl peft accelerate bitsandbytes
there is an error when loading unsloth:
NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_61 sm_70 sm_75 sm_80 sm_86 sm_90.
After installing pytorch with the new cuda:
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
then i had to compile bitsandbytes from source - and triton (which failed) and xformers - because their default versions do not work with blackwell (or at least cuda 12.8) yet
I now try installing unsloth into the nvcr.io/nvidia/pytorch:25.01-py3 docker image...
Oh no, unfortunately because we rely on Triton, Pytorch etc, if they don't work for the new GPUs then we won't either. Hopefully they support the new 50 series soon
Oh no, unfortunately because we rely on Triton, Pytorch etc, if they don't work for the new GPUs then we won't either. Hopefully they support the new 50 series soon
Seems they have enabled Blackwell supporting: Nightly Pytorch and Triton main branch. Need to use CUDA 12.8
I have a very similar problem on an H200 server. CUDA 12.8 is installed and I use a nightly build to to install torch (with visible GPUSs). I cannot pip install bitsandbytes and I must build from source but that succeeds and loads fine.
However, if I pip install unsloth it appears to downgrade CUDA to 12.4.
when do "from unsloth import FastLanguageModel" I get:
File ~/debug/lib/python3.12/site-packages/unsloth/kernels/utils.py:65
63 import bitsandbytes as bnb
64 # https://github.com/bitsandbytes-foundation/bitsandbytes/pull/1330/files --->
65 HAS_CUDA_STREAM = Version(bnb.__version__) > Version("0.43.3")
66 global CUDA_STREAM
67 CUDA_STREAM = None AttributeError: module 'bitsandbytes' has no attribute '__version__'
We released bitsandbytes 0.45.3 wheels with CUDA 12.8 builds for Blackwell yesterday.
Hurray! I can confirm 100% success on a DGX-H200 today with these updates (plus possibly also bitsandbytes updates too)! Thanks everyone!
Hi @simusid, could you please guide me through how you managed to do it?
I do own an RTX 5090 and tried installing the following deps:
- Triton: pip install -U --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/Triton-Nightly/pypi/simple/ triton-nightly
- PyTorch: pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118
- Bitsandbytes: pip install bitsandbytes
- Unsloth: tried both pip install unsloth and pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
Even though it looks everything works fine, when running the training cell it breaks the kernel.
@oteroantoniogom,
The only difference I see is that I installed torch with https://download.pytorch.org/whl/nightly/cu128 because that matches the driver installed by the sysadmin.
@simusid Yeah my bad, I copy pasted the wrong one, even though I had installed that one. It still says 'sm_120 is not a recognized processor for this target' tho. If I am not wrong is a problem with Triton, but I cannot build it from source neither.
Triton now support Blackwell.
I compiled triton, pytorch, vllm and install unsloth and all unsloth scripts are working on 5070ti with os.environ["VLLM_FLASH_ATTN_VERSION"] = "2" os.environ["VLLM_USE_V1"] = "0" I tried every day with git pull and first pytorch was working, next triton after several git pulls and compilation, finally vllm is working with unsloth scripts.
TRITON
git clone https://github.com/triton-lang/triton.git
cd triton
pip install -r python/requirements.txt # build-time dependencies
cd python
MAX_JOBS=2 python setup.py bdist_wheel
pip install dist/xxx
PYTORCH
git clone https://github.com/pytorch/pytorch
cd pytorch
export CFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"
export CXXFLAGS+=" -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict"
git submodule sync
git submodule update --init --recursive -j 8
pip install -r requirements.txt
pip install mkl-static mkl-include wheel
# Build PyTorch (will take a long time)
export CUDA_HOME=/opt/cuda
export CUDA_PATH=$CUDA_HOME
export TORCH_CUDA_ARCH_LIST=Blackwell
MAX_JOBS=2 python setup.py bdist_wheel
pip install dist/xxx
VLLM
git clone https://github.com/vllm-project/vllm.git
cd vllm
export CUDA_HOME=/opt/cuda
export CUDA_PATH=$CUDA_HOME
export TORCH_CUDA_ARCH_LIST=Blackwell
# trying to set other than only cuda libs
export USE_CUDNN=1
export USE_CUSPARSELT=1
export USE_CUFILE=1
export USE_CUDSS=0
export CMAKE_ARGS="-DUSE_CUDNN=1 -DUSE_CUSPARSELT=1 -DUSE_CUDSS=0 -DUSE_CUFILE=1"
# Build vllm (will take a long time)
export CUDA_HOME=/opt/cuda
python use_existing_torch.py
pip install -r requirements/build.txt
pip install setuptools_scm
MAX_JOBS=1 python setup.py bdist_wheel
pip install dist/xxx
- pip install bitsandbytes
- pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
@g0t4 @hailangge @hongbo-miao @itay-grudev @matthewdouglas @miweru @nernjn @oteroantoniogom @pwasiewi @qingy1337 @simusid @ttio2tech
Hello,
We just added a blackwell support installation guide here: https://github.com/unslothai/unsloth/tree/main/blackwell
In case you're looking for the conda compatible version, we have an extended version with conda/mamba instructions here but it's still a draft: https://github.com/unslothai/unsloth/blob/edd7d28114299f6c34bc4dbec9806cd41d9b9dfe/blackwell/README.md
Would appreciate your feedback after you checkout and follow on the prescribed installation instructions. thank you
Thanks @rolandtannous
It works using https://github.com/unslothai/unsloth/tree/main/blackwell
The qwen3 test script gives an error: "'Qwen3ForCausalLM' object has no attribute 'disable_gradient_checkpointing'". But it's for the inference part. After commenting out the "model.disable_gradient_checkpointing()" (line 418), it works without issue.
I followed the instructions but the xformers setup gave a ton of errors, the gist of it seems to be: nvcc fatal : unsupported architecture 'compute_120'...
I followed the instructions but the xformers setup gave a ton of errors, the gist of it seems to be: nvcc fatal : unsupported architecture 'compute_120'..s.
Blackwell requires cuda 12.8