axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

Docker image results is not finding libcudart.so

Open thistleknot opened this issue 2 years ago • 7 comments

Please check that this issue hasn't been reported before.

  • [X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

python -m bitsandbytes returns no error

Current behaviour

docker run --gpus '"all"' --rm -it winglian/axolotl:main-py3.10-cu118-2.0.1

python -m bitsandbytes

/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
  warn(msg)
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!                     If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
  warn(msg)
CUDA SETUP: Required library version not found: libbitsandbytes_cuda118_nocublaslt.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/workspace/bitsandbytes/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/workspace/bitsandbytes/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/workspace/bitsandbytes/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/workspace/bitsandbytes/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/workspace/bitsandbytes/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/workspace/bitsandbytes/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
root@e17cbb5279a5:/workspace/bitsandbytes# ^C
root@e17cbb5279a5:/workspace/bitsandbytes# CUDA_VERSION=118 make cuda11x_nomatmul; python setup.py install; python -m bitsandbytes^C
root@e17cbb5279a5:/workspace/bitsandbytes# cd ..
root@e17cbb5279a5:/workspace# ^C
root@e17cbb5279a5:/workspace# nvidia-smi
Fri Oct  6 01:53:52 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P1000                   Off | 00000000:05:00.0 Off |                  N/A |
| 40%   52C    P8              N/A /  N/A |      0MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
root@e17cbb5279a5:/workspace#

Steps to reproduce

run docker image with cuda 12.2 installed on host and attempt to run python -m bitsandbytes

Config yaml

No response

Possible solution

No response

Which Operating Systems are you using?

  • [X] Linux
  • [ ] macOS
  • [ ] Windows

Python Version

version in docker (python 3.10)

axolotl branch-commit

main/f90d5bb

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this bug has not been reported yet.
  • [X] I am using the latest version of axolotl.
  • [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

thistleknot avatar Oct 06 '23 01:10 thistleknot

okay, making progress

 git clone -b cuda-122-support-9_0 https://github.com/OpenAccess-AI-Collective/axolotl
 cd axolotl/
 pip3 install -e .
 pip3 install -U git+https://github.com/huggingface/peft.git
 # finetune lora
 accelerate launch scripts/finetune.py examples/openllama-3b/lora.yml

which gave me a new error

TypeError: xformers_forward() got an unexpected keyword argument 'padding_mask'

thistleknot avatar Oct 06 '23 03:10 thistleknot

I got further by docker up, but now I get an error on flash_attn_func

docker up accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml leads to ImportError: cannot import name 'flash_attn_func' from 'flash_attn' I tried all these too pip install flash-attn==1.0.9 pip install flash-attn --no-build-isolation

pip3 install -e '.[flash-attn,deepspeed]'

I noticed there is miniconda, tried to activate py3.9 and run, same issue

I can't seem to get to his to work. Either

    pip3 install -e .
    pip3 install packaging
    pip3 install -e '.[flash-attn,deepspeed]'
    pip3 install -U git+https://github.com/huggingface/peft.git
    pip3 install -e .
    
    accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml

results in pad_masking error

two other errors I've gotten flash_attn_func error, or TransformerEngine error

thistleknot avatar Oct 06 '23 04:10 thistleknot

I got it working

#comment out padding_mask
nano +635 "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py"

thistleknot avatar Oct 06 '23 04:10 thistleknot

Hm, that's weird. I have nvidia-smi showing CUDA 12.0 on host and I can run python -m bitsandbytes successfully in docker.

If you have the axolotl repo clone, do you want to try build the image yourself?

docker compose build
docker compose up -d
docker ps
docker exec -it <hash> bash
python -m bitsandbytes

NanoCode012 avatar Oct 06 '23 14:10 NanoCode012

I'm getting this same error when using the docker image. NVIDIA-SMI shows CUDA version 12.2. I can run python -m bitsandbytes successfully, though it says that it is targeting CUDA 11.8 (BNB_CUDA_VERSION=118). @NanoCode012 Do I need to try and align these CUDA versions?

@thistleknot , if you comment out the padding_mask line, doesn't this mean that the LLM will be trained on the entire input rather than just the output sections?

ssmi153 avatar Oct 09 '23 01:10 ssmi153

I can confirm that @thistleknot 's fix works (though I'm not sure whether this messes up content masking or not). For me, running on Runpod, this command was: nano +635 "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py" and then commenting out the padding_mask line. This is very much a temporary fix though.

ssmi153 avatar Oct 09 '23 01:10 ssmi153

Sorry for late reply.

I can run python -m bitsandbytes successfully, though it says that it is targeting CUDA 11.8 (BNB_CUDA_VERSION=118)

Axolotl is targeting 11.8 for default image. You need to adjust docker arg if building yourself. We also build for other cuda here which can be pulled: https://hub.docker.com/r/winglian/axolotl-runpod/tags

Regarding the second issue, does this still occur? Would you be able to use FA instead of xformers?

NanoCode012 avatar Mar 30 '24 17:03 NanoCode012