sd_dreambooth_extension RuntimeError: CUDA out of memory

My configs:

32GB RAM RTX 3070 TI

I always get this error, can anyone help me with some optimization I can perform to avoid this error?

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 8.00 GiB total capacity; 7.09 GiB already allocated; 0 bytes free; 7.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Dec 14 '22 21:12 thiagosmagalhaes

What are you trying to do when you're running out of memory?

Dec 14 '22 21:12 hdon96

Are you using LORA?

Dec 14 '22 21:12 a-l-e-x-d-s-9

Are you using LORA?

I enabled this and now it seems to be flowing, training is in progress, will report back if I succeed

Dec 14 '22 22:12 thiagosmagalhaes

Are you using LORA?

What exactly does LORA do?

Dec 14 '22 22:12 thiagosmagalhaes

https://github.com/cloneofsimo/lora It is a smaller approximation of dreambooth. Doesn't work properly with gradient checkpointing for now, so don't waste time training with that on.

Dec 14 '22 22:12 hdon96

Without LORA you can't train at all with 8GB VRAM card.

Dec 14 '22 22:12 a-l-e-x-d-s-9

Note that the reason it 'runs out of memory' is that the windows driver model WDDM2 reserves 8-15% of the VRAM for itself even when the card is dedicated if you have two GPUs (such as integrated graphics and a dedicated card - a common setup in most gaming laptops). So you can't use all of your VRAM for training. Also the pytorch memory gets fragmented so even if you have free memory available that isn't reserved by Windows, the ram can't be allocated because a large enough chunk isn't contiguous.

Dec 14 '22 23:12 Thomas-MMJ

I have 1080TI and I'm able to train v1.5 512 with or without lora currently

Dec 15 '22 00:12 hdon96

I have 1080TI and I'm able to train v1.5 512 with or without lora currently

Sorry, deleted my comment which you were replying to, haha. Immediately after writing it I tried to start training for the n-th time and now everything just works so... I'll have to chalk the 24 hours of failed tests up to ghosts in the computer for now.

Dec 15 '22 00:12 james-things

@kitoide , I have a 3090 Ti with 24GB of VRAM. I have to turn on "8bit Atom" under Parameters->Advanced. Make sure you leave "Gradient Checkpointing" enabled. These are both memory optimizations, sacrificing speed. I'm also using "--xformers" startup option, but not sure if that's neccessary. Finally, make sure both "Batch Size" and "Class Batch Size" are set to 1. Increasing the batch sizes uses more VRAM.

Dec 15 '22 14:12 MarcusAdams-v006200

How much VRAM do you need to train the new SD 2.0 and 2.1 models? I run out of memory with 12GB VRAM. I tried all memory attention options. I have all VRAM saving options enabled except Train Text encoder and Use CPU. Still cannot get it to train. Anyone here successfully gotten training to start on 2.x? with 12GB vram?

Dec 17 '22 21:12 nanafy

Note that the reason it 'runs out of memory' is that the windows driver model WDDM2 reserves 8-15% of the VRAM for itself even when the card is dedicated if you have two GPUs (such as integrated graphics and a dedicated card - a common setup in most gaming laptops). So you can't use all of your VRAM for training. Also the pytorch memory gets fragmented so even if you have free memory available that isn't reserved by Windows, the ram can't be allocated because a large enough chunk isn't contiguous.

I turned off my integrated one.

Dec 17 '22 21:12 nanafy

@kitoide, eu tenho um 3090 Ti com 24 GB de VRAM. Eu tenho que ativar "8bit Atom" em Parâmetros->Avançado. Certifique-se de deixar "Gradient Checkpointing" ativado. Ambas são otimizações de memória, sacrificando a velocidade. Também estou usando a opção de inicialização "--xformers", mas não tenho certeza se isso é necessário. Por fim, certifique-se de que "Batch Size" e "Class Batch Size" estejam definidos como 1. Aumentar os tamanhos dos lotes usa mais VRAM.

I do all this, to no avail!

Dec 18 '22 14:12 thiagosmagalhaes

There should be some solution for training to be done with what is available, avoiding memory overflow

Dec 18 '22 14:12 thiagosmagalhaes

I'll be following this thread. I heard people saying they trained using a 1080, there must be something we can do lol. But at the same time it's still pretty inconsistent, since the people I saw training with 8gb had the same settings as I did. (I'm running on a rtx3070)

Dec 19 '22 23:12 coldasicee

Som recent regressions on VRAM usage, was able to train a week ago with LORA, and now I can't train anything.

Dec 21 '22 02:12 Thomas-MMJ

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

Dec 30 '22 00:12 github-actions[bot]

Som recent regressions on VRAM usage, was able to train a week ago with LORA, and now I can't train anything.

New version out. Give it a go.

Jan 01 '23 19:01 d8ahazard

I am unable to train with Lora now again, this is first time I've tried Lora so I don't know if it's related I am on 8GBs as well

Sorry for bothering

Jan 13 '23 07:01 nonetrix

I am unable to train with Lora now again, this is first time I've tried Lora so I don't know if it's related I am on 6GBs as well

Sorry for bothering

QQ截图20230324175707

Mar 24 '23 09:03 sunyiwk

sd_dreambooth_extension sd_dreambooth_extension copied to clipboard

RuntimeError: CUDA out of memory

sd_dreambooth_extension
sd_dreambooth_extension copied to clipboard