Dreambooth-Stable-Diffusion icon indicating copy to clipboard operation
Dreambooth-Stable-Diffusion copied to clipboard

RuntimeError: CUDA out of memory with RTX 3090 (24 GB VRAM)

Open Tuxius opened this issue 2 years ago • 13 comments

Following 1:1 the instructions I get an out of Memory despite having 24 GB VRAM available:

  File "Y:\221009_dreambooth\ldm\modules\attention.py", line 180, in forward
    sim = einsum('b i d, b j d -> b i j', q, k) * self.scale
  File "C:\Users\frank\anaconda3\envs\dreambooth\lib\site-packages\torch\functional.py", line 327, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 24.00 GiB total capacity; 22.74 GiB already allocated; 0 bytes free; 23.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried some changes in v1-finetune_unfrozen.yaml (e.g. num_workers: from 2 to 1), but no improvement.

Has anybody successfully run this under Windows with 24 GB VRAM?

Tuxius avatar Oct 09 '22 18:10 Tuxius

you can run it on windows wsl2 https://www.youtube.com/watch?v=w6PTviOCYQY&t=15s

mengen-li avatar Oct 10 '22 08:10 mengen-li

Is it possible to run the training on 11 GB VRAM?

wyang22 avatar Oct 12 '22 12:10 wyang22

I was getting a lot of out of memory on 24GB 3090 -- I ended up using a bigger server -- and saw it would consume upto 28GB RAM, went upto 30GB at one point.

Could be possible some config needs to be tweaked while running, not sure 🤷🏼‍♂️

dminGod avatar Oct 15 '22 17:10 dminGod

@ChinaArvin : Thank you, yes following these instruction it works now nicely well below 24 GB. However, having to use WSL feels like a workaround, even though I enjoy Linux command line. It should be possible to get this running under native Windows?

@wyang22: If you sacrifice some settings even below 11GB under WSL are possible, just follow the instructions of the video

Tuxius avatar Oct 15 '22 19:10 Tuxius

I finally ended up using this:

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

dminGod avatar Oct 25 '22 02:10 dminGod

image It does indeed not work with a 3090 on windows 11, but fine under WSL on the same machine, same (default) config. Must be a bug on windows then..

fr34kyn01535 avatar Nov 05 '22 12:11 fr34kyn01535

@wyang22 see https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth you have several configuration there:

Use the table below to choose the best flags based on your memory and speed requirements. Tested on Tesla T4 GPU.

fp16 train_batch_size gradient_accumulation_steps gradient_checkpointing use_8bit_adam GB VRAM usage Speed (it/s)
fp16 1 1 TRUE TRUE 9.92 0.93
no 1 1 TRUE TRUE 10.08 0.42
fp16 2 1 TRUE TRUE 10.4 0.66
fp16 1 1 FALSE TRUE 11.17 1.14
no 1 1 FALSE TRUE 11.17 0.49
fp16 1 2 TRUE TRUE 11.56 1
fp16 2 1 FALSE TRUE 13.67 0.82
fp16 1 2 FALSE TRUE 13.7 0.83
fp16 1 1 TRUE FALSE 15.79 0.77

titusfx avatar Nov 09 '22 11:11 titusfx

any flag I use, I always get the CUDA out of memory error. How are you all using this? Can anyone post an example? I'm trying it on a 4090 with 24GB

AntouanK avatar Dec 07 '22 21:12 AntouanK

I'm also having OOM errors with a 3090 with 24GB. Batch size set to 1, I even set the precision flag on the Trainer to 16.

jbohnslav avatar Dec 08 '22 20:12 jbohnslav

Did anyone ever find a solution? I am also getting this error on a 3090ti

htsh avatar Dec 23 '22 19:12 htsh

any flag I use, I always get the CUDA out of memory error. How are you all using this? Can anyone post an example? I'm trying it on a 4090 with 24GB

I haven't tried with this repo, but if you are trying to train a 768 model, and don't have xformers installed correctly it will go OOM. The 768 model training hovers around 21GB VRAM. I think the 512 models should train fine.

dminGod avatar Dec 24 '22 05:12 dminGod

I finally ended up using this:

https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth

Yesss!! Finally yeah this worked. I am running on 8GB and finally got it to train using the info in this section: https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth#training-on-a-8-gb-gpu

DeepSpeed was the final piece that I needed.

Best of luck!

schematical avatar Apr 13 '23 22:04 schematical

@wyang22 see https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth you have several configuration there:

Use the table below to choose the best flags based on your memory and speed requirements. Tested on Tesla T4 GPU.

@titusfx Where can I edit these configurations or put that in?

nhatItsforce avatar Aug 14 '23 05:08 nhatItsforce