sd_dreambooth_extension icon indicating copy to clipboard operation
sd_dreambooth_extension copied to clipboard

RuntimeError: CUDA out of memory

Open thiagosmagalhaes opened this issue 2 years ago • 10 comments

My configs:

32GB RAM RTX 3070 TI

I always get this error, can anyone help me with some optimization I can perform to avoid this error?

RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 8.00 GiB total capacity; 7.09 GiB already allocated; 0 bytes free; 7.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

thiagosmagalhaes avatar Dec 14 '22 21:12 thiagosmagalhaes

What are you trying to do when you're running out of memory?

hdon96 avatar Dec 14 '22 21:12 hdon96

Are you using LORA?

a-l-e-x-d-s-9 avatar Dec 14 '22 21:12 a-l-e-x-d-s-9

Are you using LORA?

I enabled this and now it seems to be flowing, training is in progress, will report back if I succeed

thiagosmagalhaes avatar Dec 14 '22 22:12 thiagosmagalhaes

Are you using LORA?

What exactly does LORA do?

thiagosmagalhaes avatar Dec 14 '22 22:12 thiagosmagalhaes

https://github.com/cloneofsimo/lora It is a smaller approximation of dreambooth. Doesn't work properly with gradient checkpointing for now, so don't waste time training with that on.

hdon96 avatar Dec 14 '22 22:12 hdon96

Without LORA you can't train at all with 8GB VRAM card.

a-l-e-x-d-s-9 avatar Dec 14 '22 22:12 a-l-e-x-d-s-9

Note that the reason it 'runs out of memory' is that the windows driver model WDDM2 reserves 8-15% of the VRAM for itself even when the card is dedicated if you have two GPUs (such as integrated graphics and a dedicated card - a common setup in most gaming laptops). So you can't use all of your VRAM for training. Also the pytorch memory gets fragmented so even if you have free memory available that isn't reserved by Windows, the ram can't be allocated because a large enough chunk isn't contiguous.

Thomas-MMJ avatar Dec 14 '22 23:12 Thomas-MMJ

I have 1080TI and I'm able to train v1.5 512 with or without lora currently

hdon96 avatar Dec 15 '22 00:12 hdon96

I have 1080TI and I'm able to train v1.5 512 with or without lora currently

Sorry, deleted my comment which you were replying to, haha. Immediately after writing it I tried to start training for the n-th time and now everything just works so... I'll have to chalk the 24 hours of failed tests up to ghosts in the computer for now.

james-things avatar Dec 15 '22 00:12 james-things

@kitoide , I have a 3090 Ti with 24GB of VRAM. I have to turn on "8bit Atom" under Parameters->Advanced. Make sure you leave "Gradient Checkpointing" enabled. These are both memory optimizations, sacrificing speed. I'm also using "--xformers" startup option, but not sure if that's neccessary. Finally, make sure both "Batch Size" and "Class Batch Size" are set to 1. Increasing the batch sizes uses more VRAM.

MarcusAdams-v006200 avatar Dec 15 '22 14:12 MarcusAdams-v006200

How much VRAM do you need to train the new SD 2.0 and 2.1 models? I run out of memory with 12GB VRAM. I tried all memory attention options. I have all VRAM saving options enabled except Train Text encoder and Use CPU. Still cannot get it to train. Anyone here successfully gotten training to start on 2.x? with 12GB vram?

nanafy avatar Dec 17 '22 21:12 nanafy

Note that the reason it 'runs out of memory' is that the windows driver model WDDM2 reserves 8-15% of the VRAM for itself even when the card is dedicated if you have two GPUs (such as integrated graphics and a dedicated card - a common setup in most gaming laptops). So you can't use all of your VRAM for training. Also the pytorch memory gets fragmented so even if you have free memory available that isn't reserved by Windows, the ram can't be allocated because a large enough chunk isn't contiguous.

I turned off my integrated one.

nanafy avatar Dec 17 '22 21:12 nanafy

@kitoide, eu tenho um 3090 Ti com 24 GB de VRAM. Eu tenho que ativar "8bit Atom" em Parâmetros->Avançado. Certifique-se de deixar "Gradient Checkpointing" ativado. Ambas são otimizações de memória, sacrificando a velocidade. Também estou usando a opção de inicialização "--xformers", mas não tenho certeza se isso é necessário. Por fim, certifique-se de que "Batch Size" e "Class Batch Size" estejam definidos como 1. Aumentar os tamanhos dos lotes usa mais VRAM.

I do all this, to no avail!

thiagosmagalhaes avatar Dec 18 '22 14:12 thiagosmagalhaes

There should be some solution for training to be done with what is available, avoiding memory overflow

thiagosmagalhaes avatar Dec 18 '22 14:12 thiagosmagalhaes

I'll be following this thread. I heard people saying they trained using a 1080, there must be something we can do lol. But at the same time it's still pretty inconsistent, since the people I saw training with 8gb had the same settings as I did. (I'm running on a rtx3070)

coldasicee avatar Dec 19 '22 23:12 coldasicee

Som recent regressions on VRAM usage, was able to train a week ago with LORA, and now I can't train anything.

Thomas-MMJ avatar Dec 21 '22 02:12 Thomas-MMJ

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] avatar Dec 30 '22 00:12 github-actions[bot]

Som recent regressions on VRAM usage, was able to train a week ago with LORA, and now I can't train anything.

New version out. Give it a go.

d8ahazard avatar Jan 01 '23 19:01 d8ahazard

I am unable to train with Lora now again, this is first time I've tried Lora so I don't know if it's related I am on 8GBs as well

Sorry for bothering

image

nonetrix avatar Jan 13 '23 07:01 nonetrix

I am unable to train with Lora now again, this is first time I've tried Lora so I don't know if it's related I am on 6GBs as well

Sorry for bothering

QQ截图20230324175707

sunyiwk avatar Mar 24 '23 09:03 sunyiwk