unsloth ImportError: Unsloth: CUDA is not linked properly.

I followed the conda installation instructions in the README:

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

conda install xformers -c xformers

pip install bitsandbytes

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

There were no issues during the install. However, when I try to import from unsloth, I get an error.

from unsloth import FastLanguageModel

Results in an error:

/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py:71: UserWarning: Unsloth: Running `ldconfig /usr/lib64-nvidia` to link CUDA.
  warnings.warn(
/sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Permission denied
Traceback (most recent call last):
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py", line 68, in <module>
    cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
    func = self.__getitem__(name)
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cdequantize_blockwise_fp32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py", line 99, in <module>
    cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
    func = self.__getitem__(name)
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cdequantize_blockwise_fp32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/__init__.py", line 102, in <module>
    raise ImportError("Unsloth: CUDA is not linked properly.\n"\
ImportError: Unsloth: CUDA is not linked properly.
We tried running `ldconfig /usr/lib64-nvidia` ourselves, but it didn't work.
You need to run in your terminal `sudo ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.
Also try `sudo ldconfig /usr/local/cuda-xx.x` - find the latest cuda version.

I looked for both /usr/lib64-nvidia and files matching /usr/local/cuda-* but found none.

I have the NVIDIA driver installed fine for cuda=12.2. Here is the output of my nvidia-smi command:

Tue Mar  5 01:23:41 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1B.0 Off |                    0 |
| N/A   27C    P0              24W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:00:1C.0 Off |                    0 |
| N/A   27C    P0              25W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla T4                       Off | 00000000:00:1D.0 Off |                    0 |
| N/A   27C    P0              24W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   28C    P0              25W /  70W |      2MiB / 15360MiB |      6%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

I'm running Ubuntu 22.04.3 on an AWS EC2 g4dn.12xlarge which has T4 GPUs.

Mar 05 '24 01:03 athoag-sony

Sorry on the issue :(

There is another way to install it if this works (Also first check if mamba works in the terminal, since it makes installing faster by reducing solving time :))

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda=12.1 bitsandbytes -c pytorch -c nvidia -c conda-forge

conda install xformers -c xformers

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

If that doesn't work, also try folding in xformers:

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch cudatoolkit torchvision torchaudio xformers pytorch-cuda=12.1 bitsandbytes -c pytorch -c nvidia -c conda-forge -c xformers

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

Mar 05 '24 01:03 danielhanchen

Hi, thanks for the response. The first suggestion did not work. The second did at least change the error:

>>> from unsloth import FastLanguageModel
False

===================================BUG REPORT===================================
/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


  warn(msg)
================================================================================
/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/home/ubuntu/miniconda/envs/unsloth-test/lib/libcudart.so'), PosixPath('/home/ubuntu/miniconda/envs/unsloth-test/lib/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:167: UserWarning: /home/ubuntu/miniconda/envs/unsloth-test did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=121, Highest Compute Capability: 7.5.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda121.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh
CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/unsloth/__init__.py", line 59, in <module>
    import bitsandbytes as bnb
  File "/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/ubuntu/miniconda/envs/unsloth-test/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

I will look into this a bit more, but if you have seen this before and know what to do please let me know.

I did notice that if installed xformers after pytorch-cuda, it changed my pytorch version to one that was not cuda-compatible:

The following packages will be SUPERSEDED by a higher-priority channel:

  pytorch            pytorch::pytorch-2.2.1-py3.10_cuda12.~ --> pkgs/main::pytorch-2.2.0-cpu_py310hdc00b08_0

But this seems to be solved by folding in the xformers installation into the conda install command as you suggested.

Mar 05 '24 05:03 athoag-sony

I have the same issue... This "folding-in" sort of worked for me... When I try to install it says 'no module name triton found'... If i run pip3 install triton, I get a linking error again

Mar 05 '24 15:03 j-d-salinger

@j-d-salinger For triton - oh my there's no triton during the pytorch install? Maybe add an update flag to Conda.

@athoag-sony Hmm bitsandbytes issues :( I might have to open a large thread in BnB - installing from source might be the only other option for now - I'll see what I can do

Mar 05 '24 17:03 danielhanchen

Thanks!! i was able to get a functional bitsandbytes by building from source. Or at least, i no longer get a warning that it was compiled iwthout GPU support.

I thought that would solve it, but there is still an ldconfig error...

Mar 05 '24 18:03 j-d-salinger

I was able to get bitsandbytes to import by using python=3.11 instead of python=3.10. However, now when I try to import from unsloth I get:

>>> from unsloth import FastLanguageModel
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda/envs/unsloth/lib/python3.11/site-packages/unsloth/__init__.py", line 60, in <module>
    import triton
ModuleNotFoundError: No module named 'triton'

Here are the steps to reproduce:

conda create -n unsloth python=3.11 -y
conda activate unsloth
conda install pytorch cudatoolkit torchvision torchaudio xformers pytorch-cuda=12.1 bitsandbytes -c pytorch -c nvidia -c conda-forge -c xformers
pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

If I then try to install triton with conda:

conda install triton -c pytorch -c nvidia -c conda-forge -c xformers

I get a new error when I try to import from unsloth:

>>> from unsloth import FastLanguageModel
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/miniconda/envs/unsloth/lib/python3.11/site-packages/unsloth/__init__.py", line 61, in <module>
    from triton.common.build import libcuda_dirs
ModuleNotFoundError: No module named 'triton.common'

Mar 05 '24 23:03 athoag-sony

@j-d-salinger Can you run python -m bitsandbytes and python -m xformers.info in a terminal and paste ur output here

@athoag-sony Can you import triton manually and check its version by import triton; triton.__version__ It probably means thats an old version. Another approach is to use pip directly: pip install triton

Again apologies on the install issues - sadly Triton and BnB can be quite annoying to install

Mar 06 '24 02:03 danielhanchen

I'm having a similar error; tried to use the steps above and install bitsandbytes from source but still:

False
/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/__init__.py:71: UserWarning: Unsloth: Running `ldconfig /usr/lib64-nvidia` to link CUDA.
  warnings.warn(
/sbin/ldconfig.real: Can't link /usr/lib/wsl/lib/libnvoptix_loader.so.1 to libnvoptix.so.1
/sbin/ldconfig.real: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link

/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/__init__.py:102: UserWarning: Unsloth: CUDA is not linked properly.
Try running `python -m bitsandbytes` then `python -m xformers.info`
We tried running `ldconfig /usr/lib64-nvidia` ourselves, but it didn't work.
You need to run in your terminal `sudo ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.
Also try `sudo ldconfig /usr/local/cuda-xx.x` - find the latest cuda version.
Unsloth will still run for now, but maybe it might crash - let's hope it works!
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/__init__.py", line 112, in <module>
    from .models import *
  File "/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/models/__init__.py", line 15, in <module>
    from .loader import FastLanguageModel
  File "/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/models/loader.py", line 15, in <module>
    from .llama import FastLlamaModel, logger
  File "/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/models/llama.py", line 26, in <module>
    from ..kernels import *
  File "/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/kernels/__init__.py", line 15, in <module>
    from .cross_entropy_loss import fast_cross_entropy_loss
  File "/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/kernels/cross_entropy_loss.py", line 18, in <module>
    from .utils import calculate_settings, MAX_FUSED_SIZE
  File "/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/unsloth/kernels/utils.py", line 36, in <module>
    cdequantize_blockwise_fp32      = bnb.functional.lib.cdequantize_blockwise_fp32
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'cdequantize_blockwise_fp32'

@danielhanchen How can I install unsloth properly?

Mar 12 '24 09:03 avacaondata

xformers info:

xFormers 0.0.21
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
[email protected]:         unavailable
[email protected]:         unavailable
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               False
is_functorch_available:                            False
pytorch.version:                                   2.1.2.post101
pytorch.cuda:                                      not available
build.info:                                        available
build.cuda_version:                                None
build.python_version:                              3.11.6
build.torch_version:                               2.1.0.post100
build.env.TORCH_CUDA_ARCH_LIST:                    None
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source

output of bitsandbytes:

/root/miniconda3/envs/chatnatural/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++++++++++++ ANACONDA CUDA PATHS ++++++++++++++++++++
['/root/miniconda3/envs/chatnatural/lib/libomptarget.rtl.cuda.so', '/root/miniconda3/envs/chatnatural/lib/libcudart.so']

++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++
[]

+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++
[]

++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = False
COMPUTE_CAPABILITIES_PER_GPU = []
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...

WARNING: Please be sure to sanitize sensitive info from any such env vars!

Torch not compiled with CUDA enabled

Above we output some debug information. Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose ...

Mar 12 '24 11:03 avacaondata

@avacaondata Hmm there is another approach which might or might not work:

pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
pip install "unsloth[kaggle-new] @ git+https://github.com/unslothai/unsloth.git@nightly"

This is a nightly build so I'm still working on it - it might or might not work.

Mar 12 '24 11:03 danielhanchen

Ok thanks @danielhanchen that seems to solve the past issue, but now I have another one, I think related to triton:

Traceback (most recent call last):
  File "/mnt/c/Users/Usuario/Documents/autolms/src/autotransformers/autotrainer.py", line 171, in train_with_fixed_params
    test_results = self.train_one_model_fixed_params(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/c/Users/Usuario/Documents/autolms/src/autotransformers/autotrainer.py", line 302, in train_one_model_fixed_params
    self.trainer.train()
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/trainer.py", line 1624, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 354, in _fast_inner_training_loop
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/trainer.py", line 2902, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/transformers/trainer.py", line 2925, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/accelerate/utils/operations.py", line 817, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/accelerate/utils/operations.py", line 805, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/llama.py", line 807, in PeftModelForCausalLM_fast_forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 160, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/llama.py", line 740, in _CausalLM_fast_forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/llama.py", line 612, in LlamaModel_fast_forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 261, in forward
    outputs = run_function(*args)
              ^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/llama.py", line 608, in custom_forward
    return module(*inputs, past_key_value, output_attentions, padding_mask=padding_mask)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/models/gemma.py", line 109, in GemmaDecoderLayer_fast_forward
    hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states, gemma = True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
    out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
    fx[(n_rows,)](
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/triton/runtime/jit.py", line 532, in run
    self.cache[device][key] = compile(
                              ^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/triton/compiler/compiler.py", line 614, in compile
    so_path = make_stub(name, signature, constants, ids, enable_warp_specialization=enable_warp_specialization)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/triton/compiler/make_launcher.py", line 37, in make_stub
    so = _build(name, src_path, tmpdir)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth/lib/python3.11/site-packages/triton/common/build.py", line 83, in _build
    raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

Mar 12 '24 13:03 avacaondata

@avacaondata wait are u on WSL / Linux?

Mar 12 '24 13:03 danielhanchen

Yes I'm on WSL with Ubuntu 20, is that a problem for unsloth to work? @danielhanchen

Mar 12 '24 13:03 avacaondata

@avacaondata You'll want to sudo apt install build-essential and if you have the same problem, try export CC=/usr/bin/gcc (make sure that's the right path).

For those issues with BNB, I'd like to know if it is with the 0.43.0 release or a prior one, as the build/distribution pipeline has had some changes for 0.43.0. It should find the libraries that came distributed with PyTorch, but there's some situations where it may not (e.g. other CUDA toolkit versions installed, environment settings like LD_LOAD_LIBRARY, PATH, etc. I'm planning to continue contributing over there to make this more seamless.

Mar 13 '24 01:03 matthewdouglas

@matthewdouglas Thanks you! ❤️ At least the error changed:

Error out of memory at line 383 in file /src/csrc/pythonInterface.cpp

However that's strange because with normal QLoRA it works, without using the unsloth wrapper. Moreover, I'm checking the GPU memory and at no point it goes above 15GB.

Mar 13 '24 07:03 avacaondata

@avacaondata What GPU do you have? I am assuming something with 16GB vRAM. Can you share the code you're using?

It sounds like you're using a paged optimizer with unsloth, is that the same case without unsloth? Any other differences, i.e. batch size, gradient accumulation, seq length, etc?

Mar 13 '24 13:03 matthewdouglas

@matthewdouglas Thanks so much for helping the community - much appreciated!!

@avacaondata Hmmm unsure in terms of that error - can you try watch -n3 nvidia-smi in another terminal which logs the VRAM usage, and maybe that can help debug what the issue is

Mar 14 '24 12:03 danielhanchen

@j-d-salinger @athoag-sony Forgot to say there is another way, but it'll require a new environment:

pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
pip install "unsloth[kaggle-new] @ git+https://github.com/unslothai/unsloth.git@nightly"

It's on a nightly build since I'm still trying to see how to best install stuff properly

Mar 14 '24 12:03 danielhanchen

I'm checking for the vram usage of the GPU and it never goes above 15GB; I'm using a 24GB GPU (RTX3090), so it's not even close to going OOM. Without unsloth the error doesn't happen. As for the rest of parameters, I've kept them equal, let me share my code @matthewdouglas

from autotransformers import AutoTrainer, DatasetConfig, ModelConfig
from autotransformers.llm_templates import activate_neftune, QLoraWrapperModelInit, modify_tokenizer, qlora_config, SavePeftModelCallback
from functools import partial
from peft import LoraConfig
from datasets import load_dataset
from transformers import (
    PreTrainedModel,
    PreTrainedModel,
)
from peft import LoftQConfig
import torch
from peft.tuners.lora import LoraLayer
from typing import Any
from unsloth import FastLanguageModel
import os

os.environ["HF_HOME"]="/mnt/d/.cache/huggingface"

alpaca = load_dataset("somosnlp/somos-clean-alpaca-es")

CHAT_TEMPLATE = """{% for message in messages %}
    {% if message['role'] == 'user' %}
        {{'<user> ' + message['content'].strip() + ' </user>' }}
    {% elif message['role'] == 'system' %}
        {{'<system>\\n' + message['content'].strip() + '\\n</system>\\n\\n' }}
    {% elif message['role'] == 'assistant' %}
        {{ message['content'].strip() + ' </assistant>' + eos_token }}
    {% elif message['role'] == 'input' %}
        {{'<input> ' + message['content'] + ' </input>' }}
    {% endif %}
{% endfor %}"""

def process_alpaca(sample: dict) -> dict:
    chat = [
        {"role": "system", "content": "Eres un asistente que resuelve las instrucciones del usuario. Si se proporciona contexto adicional, utiliza esa información para completar la instrucción."}
    ]
    inp_ = sample["inputs"]["2-input"] 
    if inp_ is not None and inp_ != "":
        chat.append(
            {"role": "input", "content": inp_}
        )
    chat.extend(
        [
            {"role": "user", "content": sample["inputs"]["1-instruction"]},
            {"role": "assistant", "content": sample["inputs"]["3-output"]}
        ]
    )
    sample["messages"] = chat
    return sample

alpaca = alpaca.map(process_alpaca, batched=False, num_proc=4, remove_columns=[col for col in alpaca["train"].column_names if col != "messages"])

alpaca = alpaca["train"].train_test_split(0.2, seed=203984)

fixed_train_args = {
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 16,
    "warmup_ratio": 0.03,
    "learning_rate": 2e-4,  # 5e-5,
    "bf16": True,
    "logging_steps": 50,
    "lr_scheduler_type": "constant",  # "linear",
    "weight_decay": 0.001,
    "eval_steps": 200,
    "save_steps": 50,
    "num_train_epochs": 1,
    "logging_first_step": True,
    # "report_to": ["tensorboard"],
    "evaluation_strategy": "steps",  # "steps"
    "save_strategy": "steps",
    "max_grad_norm": 0.3,
    "optim": "paged_adamw_32bit",
    "gradient_checkpointing": True,
    "group_by_length": False,
    "save_total_limit": 50,
    "adam_beta2": 0.999
}

lora_config = LoraConfig(
        r=64,
        lora_alpha=16,
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",], # "all-linear",  # "query_key_value" # "Wqkv"
        lora_dropout=0,  # 0.1 for <13B models, 0.05 otherwise.
        bias="none",
        task_type="CAUSAL_LM"
)

alpaca_config = {
        "seed": 9834,
        "direction_optimize": "minimize",
        "metric_optimize": "eval_loss",
        "callbacks": [SavePeftModelCallback],
        "fixed_training_args": fixed_train_args,
        "dataset_name": "alpaca",
        "alias": "alpaca",
        "retrain_at_end": False,
        "task": "chatbot",
        "text_field": "messages",
        "label_col": "messages",
        "num_proc": 4,
        "partial_split": True, # to create a validation split.
        "loaded_dataset": alpaca
}

alpaca_config = DatasetConfig(**alpaca_config)

class FastQLoraWrapperModelInit:
    """
    A wrapper class for initializing transformer-based models with QLoRa and gradient checkpointing.

    This class serves as a wrapper for the `model_init` function, which initializes the model.
    It activates gradient checkpointing when possible and applies QLoRa to the model.

    Parameters
    ----------
    model_init : callable
        A function that initializes the transformer-based model for training.
    model_config : Any
        The configuration for the model.
    tokenizer : Any
        The tokenizer used for tokenization.

    Returns
    -------
    Pre-trained model with QLoRa and gradient checkpointing, if enabled.
    """

    def __init__(self, model_init: Any, model_config: Any, tokenizer: Any) -> None:
        self.model_init = model_init
        self.model_config = model_config
        self.tokenizer = tokenizer

    def __call__(self) -> PreTrainedModel:
        """
        Initialize the model and apply QLoRa and gradient checkpointing when configured.

        Returns
        -------
        Pre-trained model with QLoRa and gradient checkpointing, if enabled.
        """
        model, _ = FastLanguageModel.from_pretrained(
            model_name = self.model_config.name, # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
            max_seq_length = 8192,
            dtype = None,
            load_in_4bit = True,
            # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
        )
        has_gradient_checkpointing = False
        if not model.__class__.__name__ in [
            "MPTForCausalLM",
            "MixFormerSequentialForCausalLM",
        ]:
            try:
                model.resize_token_embeddings(len(self.tokenizer))
            except Exception as e:
                print(
                    f"Could not resize token embeddings due to {e}, but will continue anyway..."
                )
            try:
                model.gradient_checkpointing_enable()
                has_gradient_checkpointing = True
            except Exception as e:
                print(f"Model checkpointing did not work: {e}")
        if model.__class__.__name__ == "LlamaForCausalLM":
            model.config.pretraining_tp = 1
        # model = prepare_model_for_kbit_training(
        #     model, use_gradient_checkpointing=has_gradient_checkpointing
        # )
        # model = get_peft_model(model, self.model_config.peft_config)
        model = FastLanguageModel.get_peft_model(
            model,
            r = self.model_config.peft_config.r,
            target_modules = self.model_config.peft_config.target_modules,
            lora_alpha = self.model_config.peft_config.lora_alpha,
            lora_dropout = self.model_config.peft_config.lora_dropout, # Supports any, but = 0 is optimized
            bias = "none",    # Supports any, but = "none" is optimized
            use_gradient_checkpointing = has_gradient_checkpointing,
            random_state = 3407,
            use_rslora = True,  # We support rank stabilized LoRA
            # loftq_config = LoftQConfig(loftq_bits=4), # And LoftQ
            modules_to_save = ["lm_head", "embed_tokens",],
        )
        model.config.use_cache = False
        if self.model_config.neftune_noise_alpha is not None:
            model = activate_neftune(model, self.model_config.neftune_noise_alpha)
        # model = self.change_layer_types_for_stability(model)
        return model

    def change_layer_types_for_stability(
        self, model: PreTrainedModel
    ) -> PreTrainedModel:
        """
        Change layer types of the model for stability.

        Parameters
        ----------
        model : PreTrainedModel
            The pre-trained model.

        Returns
        -------
        Pre-trained model with modified layer types for stability.
        """
        for name, module in model.named_modules():
            if isinstance(module, LoraLayer):
                module = module.to(torch.bfloat16)
            if "norm" in name:
                module = module.to(torch.float32)
            if "lm_head" in name or "embed_tokens" in name:
                if hasattr(module, "weight"):
                    module = module.to(torch.bfloat16)
        return model


gemma_config = ModelConfig(
    name="google/gemma-2b-it",
    save_name="gemma_2b_alpaca",
    save_dir="/mnt/d/prueba_gemma_alpaca",
    custom_params_model={"trust_remote_code": True, "device_map": {"": 0}},
    model_init_wrap_cls=FastQLoraWrapperModelInit, # FastQLoraWrapperModelInit,
    quantization_config=qlora_config,
    peft_config=lora_config,
    func_modify_tokenizer=partial(modify_tokenizer, new_model_seq_length=4096, add_special_tokens={"pad_token": "[PAD]"}, chat_template=CHAT_TEMPLATE)
)

autotrainer = AutoTrainer(
    model_configs=[gemma_config],
    dataset_configs=[alpaca_config],
    metrics_dir="./metrics_alpaca",
    hp_search_mode="fixed",
    clean=True,
    metrics_cleaner="tmp_metrics_cleaner",
    use_auth_token=True
)

results = autotrainer()

It uses the library autotransformers which uses HF libraries in the background.

Thanks a lot for the help btw! :)

Mar 14 '24 15:03 avacaondata

@avacaondata Apologies on the delay - I'll check it out!

Mar 15 '24 12:03 danielhanchen

Oh wait @athoag-sony @j-d-salinger I think someone else solved it! Turns out Pytorch installs 2.2.1, but Xformers requires 2.2.0.

Below should work:

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch==2.2.0 cudatoolkit torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia

conda install xformers -c xformers

pip install bitsandbytes

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

Mar 15 '24 17:03 danielhanchen

Actually update! Xformers now supports 2.2.1 a few hours ago!!

Mar 15 '24 17:03 danielhanchen

@danielhanchen The following worked as you suggested:

conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch==2.2.0 cudatoolkit torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia

conda install xformers -c xformers

pip install bitsandbytes

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

Or at least I can do the import without any error now:

from unsloth import FastLanguageModel

FYI the import is quite slow (10-15 seconds). I'm not sure if that is standard.

Thanks for your help.

Mar 15 '24 19:03 athoag-sony

@athoag-sony I'll work on something to make it faster, but it's generally checking for all the necessary packages to see if they work :)

Mar 16 '24 01:03 danielhanchen

@avacaondata I forgot to ask if you had Pytorch 2.2.0 or 2.2.1? If you do, I updated Unsloth to reduce memory VRAM fragmentation which might be the issue of OOMs - although I'm not sure. It can sometimes reduce VRAM usage by 3GB!!

Mar 20 '24 17:03 danielhanchen

Oh wait @athoag-sony @j-d-salinger I think someone else solved it! Turns out Pytorch installs 2.2.1, but Xformers requires 2.2.0.

Below should work:
conda create --name unsloth_env python=3.10
conda activate unsloth_env

conda install pytorch==2.2.0 cudatoolkit torchvision torchaudio pytorch-cuda=<12.1/11.8> -c pytorch -c nvidia

conda install xformers -c xformers

pip install bitsandbytes

pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

I've just tried this but still get the same error. Have tried recreating the conda env with no success.

++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

+++++++++++++++++++ ANACONDA CUDA PATHS ++++++++++++++++++++ ['/home/anon/miniconda3/envs/unsloth_env/lib/libcudart.so']

++++++++++++++++++ /usr/local CUDA PATHS +++++++++++++++++++ []

+++++++++++++++ WORKING DIRECTORY CUDA PATHS +++++++++++++++ []

++ LD_LIBRARY_PATH /usr/local/cuda-12.4/lib64 CUDA PATHS +++ ['/usr/local/cuda-12.4/lib64/stubs/libcuda.so']

++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++ COMPILED_WITH_CUDA = True COMPUTE_CAPABILITIES_PER_GPU = ['8.6'] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Checking that the library is importable and CUDA is callable...

WARNING: Please be sure to sanitize sensitive info from any such env vars!

SUCCESS! Installation was successful!

Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured. xFormers 0.0.24+cu118 memory_efficient_attention.cutlassF: available memory_efficient_attention.cutlassB: available memory_efficient_attention.decoderF: available [email protected]: available [email protected]: available memory_efficient_attention.smallkF: available memory_efficient_attention.smallkB: available memory_efficient_attention.tritonflashattF: unavailable memory_efficient_attention.tritonflashattB: unavailable memory_efficient_attention.triton_splitKF: available indexing.scaled_index_addF: available indexing.scaled_index_addB: available indexing.index_select: available sequence_parallel_fused.write_values: unavailable sequence_parallel_fused.wait_values: unavailable sequence_parallel_fused.cuda_memset_32b_async: unavailable sp24.sparse24_sparsify_both_ways: available sp24.sparse24_apply: available sp24.sparse24_apply_dense_output: available sp24._sparse24_gemm: available [email protected]: available swiglu.dual_gemm_silu: available swiglu.gemm_fused_operand_sum: available swiglu.fused.p.cpp: available is_triton_available: True pytorch.version: 2.2.0+cu121 pytorch.cuda: available gpu.compute_capability: 8.6 gpu.name: NVIDIA GeForce RTX 3090 dcgm_profiler: unavailable build.info: available build.cuda_version: 1108 build.python_version: 3.10.13 build.torch_version: 2.2.0+cu118 build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0 build.env.XFORMERS_BUILD_TYPE: Release build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: None build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.24 build.nvcc_version: 11.8.89 source.privacy: open source

from unsloth import FastLanguageModel /home/anon/.local/lib/python3.10/site-packages/unsloth/init.py:72: UserWarning: Unsloth: Running ldconfig /usr/lib64-nvidia to link CUDA. warnings.warn( /sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Permission denied /home/anon/.local/lib/python3.10/site-packages/unsloth/init.py:103: UserWarning: Unsloth: CUDA is not linked properly. Try running python -m bitsandbytes then python -m xformers.info We tried running ldconfig /usr/lib64-nvidia ourselves, but it didn't work. You need to run in your terminal sudo ldconfig /usr/lib64-nvidia yourself, then import Unsloth. Also try sudo ldconfig /usr/local/cuda-xx.x - find the latest cuda version. Unsloth will still run for now, but maybe it might crash - let's hope it works! warnings.warn(

Apr 06 '24 09:04 socallme

@socallme We have new install instructions for that!

# RTX 3090, 4090 Ampere GPUs:
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes

# Pre Ampere RTX 2080, T4, GTX 1080 GPUs:
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers trl peft accelerate bitsandbytes

Apr 06 '24 11:04 danielhanchen

@danielhanchen I have tried those instructions with the same result. There are never any errors or warnings when installing either.

/usr/local/lib/python3.10/dist-packages/unsloth/init.py:72: UserWarning: Unsloth: Running ldconfig /usr/lib64-nvidia to link CUDA. warnings.warn( /usr/local/lib/python3.10/dist-packages/unsloth/init.py:103: UserWarning: Unsloth: CUDA is not linked properly.

Apr 07 '24 01:04 socallme

@socallme Are you using conda? Can you try running nvcc in the terminal and nvidia-smi

Apr 07 '24 17:04 danielhanchen

@danielhanchen I am using conda. Are there links I need to manually make?

I am able to run other tools without issue in conda, eg axolotl, llamacpp nvcc and nvidia-smi:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Mar_28_02:18:24_PDT_2024 Cuda compilation tools, release 12.4, V12.4.131 Build cuda_12.4.r12.4/compiler.34097967_0

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 788204 C python 21856MiB | +-----------------------------------------------------------------------------------------+

Apr 09 '24 21:04 socallme

unsloth unsloth copied to clipboard

ImportError: Unsloth: CUDA is not linked properly.

unsloth
unsloth copied to clipboard