bitsandbytes
bitsandbytes copied to clipboard
Using nerdy rodent's dreamlab training, I have error on training about cuda.
I am using Nerdy Rodent's dreamlab local install video which I have followed step by step, at the end bitsandbytes seems to give an error. I tried reloading all the CUDA stuff and tried the new 11.8 cuda version which seems to differ from video and still gives same error:
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:86: UserWarning: /home/user/anaconda3/envs/diffusers did not contain libcudart.so as expected! Searching further paths...
warn(
/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('CompVis/stable-diffusion-v1-4')}
warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
Traceback (most recent call last):
File "/home/user/github/diffusers/examples/dreambooth/train_dreambooth.py", line 657, in
11.8 isn't currently supported, you might try an older CUDA library version I'd go with 11.6 or earlier.
11.8 isn't currently supported, you might try an older CUDA library version I'd go with 11.6 or earlier.
Same error and i'm on 11.7:
-
diffusers
version: 0.4.0.dev0 - Platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
- Python version: 3.9.13
- PyTorch version (GPU?): 1.12.1+cu116 (True)
- Huggingface_hub version: 0.10.0
- Transformers version: 4.22.2
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
GPU: 1080 ti
How i downgrade to 11.6, just copy this commands:
and it will downgrade or need to uninstall Ubuntu and start all over again?
Or need to deleted everything CUDA related with this commands?
Even with those commands, the issue wasn’t solved.
Eventually, the fastest way to fix 2 machines with a package manager is to purge all Nvidia & Cuda,did it by:
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
@brentjohnston
What GPU you have and what you selected on accelerate config when asking [NO/fp16/bf16]?
PD: I tried different selections but nothing changed.
11.8 isn't currently supported, you might try an older CUDA library version I'd go with 11.6 or earlier.
Can confirm that with CUDA 11.6 it works, at least with a 1080 TI.
The guide of nerdy rodent's use 11.7 on the Pastebin and in the video he shows 11.8, so none of them will work, following that part it will never have worked.
In the video, pastebin and on my system I use CUDA 11.7.1. - typically Nvidia updated the day after ;) You'll need to ensure your MS Windows system is up-to-date as well. If you have old Nvidia drivers in MS Windows you may need to downgrade CUDA.
Where it says CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
you need to reboot / add the line as stated in the video & shown in pastebin file:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
port LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Correct, this was the main cause, not the CUDA version.
The export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH need to be in the config of the train file.
Even if you reboot, it will still not find CUDA if that line is not added.
But in your video you say, "reboot or add this line". So ppl take that as if you restart not need to add that line, but the line must be added permanent in the config.
This is super helpful — thank you, everyone! I will add CUDA 11.8 as soon as possible!
CUDA 11.8 was added in the lastest release. I also added code that gives some compilation and debugging instructions if the CUDA setup fails.
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
port LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Correct, this was the main cause, not the CUDA version.
The export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH need to be in the config of the train file.
Even if you reboot, it will still not find CUDA if that line is not added.
But in your video you say, "reboot or add this line". So ppl take that as if you restart not need to add that line, but the line must be added permanent in the config.
Sorry to bother, but for us tech newbies, how does one do that?
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
port LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Correct, this was the main cause, not the CUDA version. The export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH need to be in the config of the train file. Even if you reboot, it will still not find CUDA if that line is not added. But in your video you say, "reboot or add this line". So ppl take that as if you restart not need to add that line, but the line must be added permanent in the config.
Sorry to bother, but for us tech newbies, how does one do that?
In your train file:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH export MODEL_NAME="darkstorm2150/Protogen_x3.4_Official_Release" export INSTANCE_DIR="training" export OUTPUT_DIR="my_model"
accelerate launch train_dreambooth.py
--pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse"
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--output_dir=$OUTPUT_DIR
--train_text_encoder
--instance_prompt="laarretaa"
--resolution=512
--train_batch_size=1
--learning_rate=1e-6
--lr_scheduler="constant"
--lr_warmup_steps=0
--gradient_accumulation_steps=2 --gradient_checkpointing
--use_8bit_adam
--save_interval=500
--max_train_steps=4500
I have this issue with nerdy rodents guide on oobabooga's text-generation-webui with one-click installer on gtx 1080ti in windows. Bitsandbytes cannot find cuda. What is the solution there? Can I add that line somewhere?
I have this issue with nerdy rodents guide on oobabooga's text-generation-webui with one-click installer on gtx 1080ti in windows. Bitsandbytes cannot find cuda. What is the solution there? Can I add that line somewhere?
See this post https://github.com/oobabooga/text-generation-webui/issues/20#issuecomment-1411650652 :)
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
port LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Correct, this was the main cause, not the CUDA version. The export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH need to be in the config of the train file. Even if you reboot, it will still not find CUDA if that line is not added. But in your video you say, "reboot or add this line". So ppl take that as if you restart not need to add that line, but the line must be added permanent in the config.
Sorry to bother, but for us tech newbies, how does one do that?
In your train file:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH export MODEL_NAME="darkstorm2150/Protogen_x3.4_Official_Release" export INSTANCE_DIR="training" export OUTPUT_DIR="my_model"
accelerate launch train_dreambooth.py --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" --pretrained_model_name_or_path=$MODEL_NAME --instance_data_dir=$INSTANCE_DIR --output_dir=$OUTPUT_DIR --train_text_encoder --instance_prompt="laarretaa" --resolution=512 --train_batch_size=1 --learning_rate=1e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --gradient_accumulation_steps=2 --gradient_checkpointing --use_8bit_adam --save_interval=500 --max_train_steps=4500
Hi, I got the same error but I don't have the folder "/usr/lib/wsl", could you tell me what the problem might be? Much appreciated!