axolotl Docker image results is not finding libcudart.so

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

python -m bitsandbytes returns no error

Current behaviour

docker run --gpus '"all"' --rm -it winglian/axolotl:main-py3.10-cu118-2.0.1

python -m bitsandbytes

/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  warn(msg)
CUDA SETUP: Required library version not found: libbitsandbytes_cuda118_nocublaslt.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/workspace/bitsandbytes/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/workspace/bitsandbytes/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/workspace/bitsandbytes/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/workspace/bitsandbytes/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/workspace/bitsandbytes/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/workspace/bitsandbytes/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
root@e17cbb5279a5:/workspace/bitsandbytes# ^C
root@e17cbb5279a5:/workspace/bitsandbytes# CUDA_VERSION=118 make cuda11x_nomatmul; python setup.py install; python -m bitsandbytes^C
root@e17cbb5279a5:/workspace/bitsandbytes# cd ..
root@e17cbb5279a5:/workspace# ^C
root@e17cbb5279a5:/workspace# nvidia-smi
Fri Oct  6 01:53:52 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P1000                   Off | 00000000:05:00.0 Off |                  N/A |
| 40%   52C    P8              N/A /  N/A |      0MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
root@e17cbb5279a5:/workspace#

Steps to reproduce

run docker image with cuda 12.2 installed on host and attempt to run python -m bitsandbytes

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

version in docker (python 3.10)

axolotl branch-commit

main/f90d5bb

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Oct 06 '23 01:10 thistleknot

okay, making progress

 git clone -b cuda-122-support-9_0 https://github.com/OpenAccess-AI-Collective/axolotl
 cd axolotl/
 pip3 install -e .
 pip3 install -U git+https://github.com/huggingface/peft.git
 # finetune lora
 accelerate launch scripts/finetune.py examples/openllama-3b/lora.yml

which gave me a new error

TypeError: xformers_forward() got an unexpected keyword argument 'padding_mask'

Oct 06 '23 03:10 thistleknot

I got further by docker up, but now I get an error on flash_attn_func

docker up accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml leads to ImportError: cannot import name 'flash_attn_func' from 'flash_attn' I tried all these too pip install flash-attn==1.0.9 pip install flash-attn --no-build-isolation

pip3 install -e '.[flash-attn,deepspeed]'

I noticed there is miniconda, tried to activate py3.9 and run, same issue

I can't seem to get to his to work. Either

    pip3 install -e .
    pip3 install packaging
    pip3 install -e '.[flash-attn,deepspeed]'
    pip3 install -U git+https://github.com/huggingface/peft.git
    pip3 install -e .
    
    accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml

results in pad_masking error

two other errors I've gotten flash_attn_func error, or TransformerEngine error

Oct 06 '23 04:10 thistleknot

I got it working

#comment out padding_mask
nano +635 "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py"

Oct 06 '23 04:10 thistleknot

Hm, that's weird. I have nvidia-smi showing CUDA 12.0 on host and I can run python -m bitsandbytes successfully in docker.

If you have the axolotl repo clone, do you want to try build the image yourself?

docker compose build
docker compose up -d
docker ps
docker exec -it <hash> bash
python -m bitsandbytes

Oct 06 '23 14:10 NanoCode012

I'm getting this same error when using the docker image. NVIDIA-SMI shows CUDA version 12.2. I can run python -m bitsandbytes successfully, though it says that it is targeting CUDA 11.8 (BNB_CUDA_VERSION=118). @NanoCode012 Do I need to try and align these CUDA versions?

@thistleknot , if you comment out the padding_mask line, doesn't this mean that the LLM will be trained on the entire input rather than just the output sections?

Oct 09 '23 01:10 ssmi153

I can confirm that @thistleknot 's fix works (though I'm not sure whether this messes up content masking or not). For me, running on Runpod, this command was: nano +635 "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py" and then commenting out the padding_mask line. This is very much a temporary fix though.

Oct 09 '23 01:10 ssmi153

Sorry for late reply.

I can run python -m bitsandbytes successfully, though it says that it is targeting CUDA 11.8 (BNB_CUDA_VERSION=118)

Axolotl is targeting 11.8 for default image. You need to adjust docker arg if building yourself. We also build for other cuda here which can be pulled: https://hub.docker.com/r/winglian/axolotl-runpod/tags

Regarding the second issue, does this still occur? Would you be able to use FA instead of xformers?

Mar 30 '24 17:03 NanoCode012

axolotl axolotl copied to clipboard

Docker image results is not finding libcudart.so

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

axolotl
axolotl copied to clipboard