axolotl
axolotl copied to clipboard
Docker image results is not finding libcudart.so
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
python -m bitsandbytes returns no error
Current behaviour
docker run --gpus '"all"' --rm -it winglian/axolotl:main-py3.10-cu118-2.0.1
python -m bitsandbytes
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2
warn(msg)
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 6.1.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
/workspace/bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU! If you run into issues with 8-bit matmul, you can try 4-bit quantization: https://huggingface.co/blog/4bit-transformers-bitsandbytes
warn(msg)
CUDA SETUP: Required library version not found: libbitsandbytes_cuda118_nocublaslt.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================
CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x_nomatmul
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 146, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 110, in _get_module_details
__import__(pkg_name)
File "/workspace/bitsandbytes/bitsandbytes/__init__.py", line 6, in <module>
from . import cuda_setup, utils, research
File "/workspace/bitsandbytes/bitsandbytes/research/__init__.py", line 1, in <module>
from . import nn
File "/workspace/bitsandbytes/bitsandbytes/research/nn/__init__.py", line 1, in <module>
from .modules import LinearFP8Mixed, LinearFP8Global
File "/workspace/bitsandbytes/bitsandbytes/research/nn/modules.py", line 8, in <module>
from bitsandbytes.optim import GlobalOptimManager
File "/workspace/bitsandbytes/bitsandbytes/optim/__init__.py", line 6, in <module>
from bitsandbytes.cextension import COMPILED_WITH_CUDA
File "/workspace/bitsandbytes/bitsandbytes/cextension.py", line 20, in <module>
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
root@e17cbb5279a5:/workspace/bitsandbytes# ^C
root@e17cbb5279a5:/workspace/bitsandbytes# CUDA_VERSION=118 make cuda11x_nomatmul; python setup.py install; python -m bitsandbytes^C
root@e17cbb5279a5:/workspace/bitsandbytes# cd ..
root@e17cbb5279a5:/workspace# ^C
root@e17cbb5279a5:/workspace# nvidia-smi
Fri Oct 6 01:53:52 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro P1000 Off | 00000000:05:00.0 Off | N/A |
| 40% 52C P8 N/A / N/A | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
root@e17cbb5279a5:/workspace#
Steps to reproduce
run docker image with cuda 12.2 installed on host and attempt to run python -m bitsandbytes
Config yaml
No response
Possible solution
No response
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
version in docker (python 3.10)
axolotl branch-commit
main/f90d5bb
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
okay, making progress
git clone -b cuda-122-support-9_0 https://github.com/OpenAccess-AI-Collective/axolotl
cd axolotl/
pip3 install -e .
pip3 install -U git+https://github.com/huggingface/peft.git
# finetune lora
accelerate launch scripts/finetune.py examples/openllama-3b/lora.yml
which gave me a new error
TypeError: xformers_forward() got an unexpected keyword argument 'padding_mask'
I got further by docker up, but now I get an error on flash_attn_func
docker up accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml leads to ImportError: cannot import name 'flash_attn_func' from 'flash_attn' I tried all these too pip install flash-attn==1.0.9 pip install flash-attn --no-build-isolation
pip3 install -e '.[flash-attn,deepspeed]'
I noticed there is miniconda, tried to activate py3.9 and run, same issue
I can't seem to get to his to work. Either
pip3 install -e .
pip3 install packaging
pip3 install -e '.[flash-attn,deepspeed]'
pip3 install -U git+https://github.com/huggingface/peft.git
pip3 install -e .
accelerate launch scripts/finetune.py examples/openllama-3b/qlora.yml
results in pad_masking error
two other errors I've gotten flash_attn_func error, or TransformerEngine error
I got it working
#comment out padding_mask
nano +635 "/home/user/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py"
Hm, that's weird. I have nvidia-smi showing CUDA 12.0 on host and I can run python -m bitsandbytes successfully in docker.
If you have the axolotl repo clone, do you want to try build the image yourself?
docker compose build
docker compose up -d
docker ps
docker exec -it <hash> bash
python -m bitsandbytes
I'm getting this same error when using the docker image. NVIDIA-SMI shows CUDA version 12.2. I can run python -m bitsandbytes successfully, though it says that it is targeting CUDA 11.8 (BNB_CUDA_VERSION=118). @NanoCode012 Do I need to try and align these CUDA versions?
@thistleknot , if you comment out the padding_mask line, doesn't this mean that the LLM will be trained on the entire input rather than just the output sections?
I can confirm that @thistleknot 's fix works (though I'm not sure whether this messes up content masking or not). For me, running on Runpod, this command was: nano +635 "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py" and then commenting out the padding_mask line. This is very much a temporary fix though.
Sorry for late reply.
I can run python -m bitsandbytes successfully, though it says that it is targeting CUDA 11.8 (BNB_CUDA_VERSION=118)
Axolotl is targeting 11.8 for default image. You need to adjust docker arg if building yourself. We also build for other cuda here which can be pulled: https://hub.docker.com/r/winglian/axolotl-runpod/tags
Regarding the second issue, does this still occur? Would you be able to use FA instead of xformers?