Dreambooth-Stable-Diffusion
Dreambooth-Stable-Diffusion copied to clipboard
DreamBooth Stable Diffusion training now possible in 10 GB VRAM, and it runs about 2 times faster.
Hey, So I managed to run Stable Diffusion dreambooth training in just 17.7GB GPU usage by replacing the attention with memory efficient flash attention from xformers. Along with using way less memory, it also runs 2 times faster. So it's possible to train SD in 24GB GPUs now. Tested on Nvidia A10G, took 15-20 mins to train. I hope it's helpful.
Code in my fork: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/
Can even train on batch size of 2.
With some more tweaks it might be possible to train even on 16 GB gpus.
And it works, Outputs: Me in Fortnite
https://github.com/huggingface/diffusers/pull/554#issuecomment-1258751183
Very cool. Doing what I can for 16gb too.
I'm running into issues with it finding the gpus I think. 4xA10G. I'll post code tomorrow.
Wow, Using the 8bit adam optimizer from bitsandbytes along with xformers reduces the memory usage to 12.5 GB.
Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb
Code: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/
There is no such file. 404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json
Edit: Issue resolved.
Do you have a donation link? I don't have much, but you are doing great work.
Do you have a donation link? I don't have much, but you are doing great work.
Hey, Thanks. No donation link haha. Good to hear you liked it. It has been quite fun to do for me.
@ShivamShrirao I've been trying to run your notebook on Runpod with Pytorch and an A5000 but I'm getting an error during pip install "Building wheel for xformers (setup.py) ... error". Training starts with a bitsandbytes bug report but runs and eventually after 20 min of training it crashes.
I'd also love to donate if I can get this working.
There is no such file. 404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json
Edit: Issue resolved.
@Daniel-Kelvich How did you fix this?
@pdjohntony What error are you facing ? If 404, it may be due to not being authenticated with huggingface cli.
@ShivamShrirao I managed to get your dreambooth example working but its been running for 2 hours now on an A5000.
Since thats taking so long, I spun up another instance on vast with 2 A5000's but now I'm getting the 404. It shouldn't be an auth issue with huggingface as a logged in on the CLI and it appeared to download the model for a while before getting this 404 error.
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_cpu_threads_per_process` was set to `24` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Traceback (most recent call last):
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/configuration_utils.py", line 596, in _get_config_dict
resolved_config_file = cached_path(
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 282, in cached_path
output_path = get_from_cache(
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 486, in get_from_cache
_raise_for_status(r)
File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/utils/hub.py", line 409, in _raise_for_status
raise EntryNotFoundError(f"404 Client Error: Entry Not Found for url: {request.url}")
transformers.utils.hub.EntryNotFoundError: 404 Client Error: Entry Not Found for url: https://huggingface.co/CompVis/stable-diffusion-v1-4/resolve/main/config.json
Great work! I managed to run it in a google colab. I was just wondering, how do I get checkpoint files that I can use later on from the model files that are stored?
I could only find the
feature_extractor logs model_index.json safety_checker scheduler text_encoder tokenizer unet vae
folders/files that were stored in the --output_dir=$OUTPUT_DIR
after it was done training.
@roar-emaus These are the diffuser version of weights. I have added an inference example in colab on how to use them in diffusers. For others you will need to convert them.
@roar-emaus These are the diffuser version of weights. I have added an inference example in colab on how to use them in diffusers. For others you will need to convert them.
Thank you! will test it tomorrow :)
finally got it to work, how can we use the model to reuse in a stable colab @ShivamShrirao ? I have used the inference but how do i save my model, i havent even been able to find what folder its in lol, any info on how to convert it into a ckpt?? great work !!
finally got it to work, how can we use the model to reuse in a stable colab @ShivamShrirao ? I have used the inference but how do i save my model, i havent even been able to find what folder its in lol, any info on how to convert it into a ckpt?? great work !!
I haven't figured out yet how to convert to single ckpt to use in other repos. Currently the whole folder is your model, you can save the whole folder until someone figures it out. This needs to be reversed https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py
@ShivamShrirao If I'm reading things right, 8bit AdamW should be a drop in replacement and the modified CrossAttention class seems like it should just be able to replace the one in ldm/modules/attention.py in this repository. Sadly can't test it myself because bitsandbytes has a C extension that uses CUDA and I'm on AMD
successfully trained one model, but my second time training im getting an error @ShivamShrirao
Steps: 2% 18/1000 [00:56<45:45, 2.80s/it, loss=0.536, lr=5e-6]Traceback (most recent call last):
File "train_dreambooth.py", line 606, in
Very nice progress! Digging in more now
@pdjohntony try to update transformers library pip install -U transformers
@ShivamShrirao I'm assuming you mean only the items in the imv folder make up the ckpt file, I deleted my colab and only saved those items to the google drive
@ShivamShrirao
in the collab
--instance_prompt="photo of imv{CLASS_NAME}" \
--class_prompt="photo of a {CLASS_NAME}" \
are no f strings, they should be right ?
cheers
@binarymind Not required here cause it executes as a shell command.
ok thanks !
during this cell I got the following result
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--num_cpu_threads_per_process` was set to `32` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py:179: UserWarning: `log_with=tensorboard` was passed but no supported trackers are currently installed.
warnings.warn(f"`log_with={log_with}` was passed but no supported trackers are currently installed.")
Fetching 16 files: 100%|█████████████████████| 16/16 [00:00<00:00, 13678.94it/s]
Generating class images: 0%| | 0/25 [00:00<?, ?it/s]FATAL: this function is for sm80, but was built for sm750
FATAL: this function is for sm80, but was built for sm750
my nvidia-smi is the following
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94 Driver Version: 470.94 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A6000 On | 00000000:0F:00.0 Off | Off |
| 30% 27C P8 26W / 300W | 1MiB / 48685MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I tried also to do the
%pip install git+https://github.com/facebookresearch/xformers@1d31a3a#egg=xformers
some cells above as it was not working.
currently stucked there
Lol I fixed my problem by removing the f strings I added.... sorry
edit: ah nope was not that, launched again the notebook on a new repo and the problem appear again, looking at it
I'm hoping for a (fingers crossed not too distant) future version of this that can run on requirements of a 3080. Will put it into reach of many more people including myself. Keep up the great work!!
I'm not having any success. Trying to use V100 on colab.
Generating class images: 0% 0/50 [00:06<?, ?it/s]
Traceback (most recent call last):
File "train_dreambooth.py", line 606, in <module>
main()
File "train_dreambooth.py", line 362, in main
images = pipeline(example["prompt"]).images
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 259, in __call__
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 254, in forward
encoder_hidden_states=encoder_hidden_states,
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 565, in forward
hidden_states = attn(hidden_states, context=encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 155, in forward
hidden_states = block(hidden_states, context=context)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 204, in forward
hidden_states = self.attn1(self.norm1(hidden_states)) + hidden_states
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 288, in forward
hidden_states = xformers.ops.memory_efficient_attention(query, key, value)
File "/usr/local/lib/python3.7/dist-packages/xformers/ops.py", line 575, in memory_efficient_attention
query=query, key=key, value=value, attn_bias=attn_bias, p=p
File "/usr/local/lib/python3.7/dist-packages/xformers/ops.py", line 196, in forward_no_grad
causal=isinstance(attn_bias, LowerTriangularMask),
File "/usr/local/lib/python3.7/dist-packages/torch/_ops.py", line 143, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/data/sks', '--class_data_dir=/content/data/dog', '--output_dir=/content/models/sks', '--with_prior_preservation', '--instance_prompt=photo of sks dog', '--class_prompt=photo of a dog', '--resolution=512', '--use_8bit_adam', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=200', '--max_train_steps=600']' returned non-zero exit status 1
@JoeMcGuire you will need to compile the xformers, current wheels only support T4 GPU.
there are xformers for p100 on this colab precompiled, how to incorporate those into dreambooth ? It will cover colab pro https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb#scrollTo=a---cT2rwUQj
under installing xformers Also how about optional googledrive cell to upload trained model + prune cell to get it to 2gb? If some of You will compile whl for p100 please download and store it in gdrive to share
yeah , now its kinda not useable on webuis and most people are on webuis, huggingface love their bins also default 600 steps are pretty bad, not sure why its default ? should be more like at least 2000
Any chances to run on 12GB rtx 3060?
I'm getting Tried to allocate 4.00 GiB (GPU 0; 12.00 GiB total capacity; 4.81 GiB already allocated; 890.00 MiB free; 8.81 GiB reserved in total by PyTorch)
error even with --use_8bit_adam
flag