ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

Training still OOM on 8GB gpu.

Open lllyasviel opened this issue 2 years ago • 1 comments

It seems that I already used many tricks but training still OOM for 8GB gpu. But inference is good now.

This is strange because I know some textural inversion or dreambooth can be trained on 8GB.

What is the secrect of Automatic1111's optimization? Although xformers may help a bit, the currect sliced attention should require even smaller mem than xformers.

Does it make sence to move text encoder and vae outside gpu when training?

lllyasviel avatar Feb 12 '23 18:02 lllyasviel

Someone linked you were looking for help, can provide some insights perhaps.

You can cache the latents of your training images ahead of time to disk or sys ram, but the VAE isn't that large either. Caching too much to sys ram can run your sys ram usage up a lot. I purposely do not cache latents in EveryDream because it breaks crop jitter, but if your absolute goal is minimum vram use it makes sense. Caching them to disk is the safest, save as .pt (torch tensors) and read on the fly, maybe a hash dictionary of the filename and tensor. Keep in mind creating the cache ahead will take some time for larger datasets, depends on your goals.

I've found text encoder training makes a big difference when doing normal unfrozen unet training. Not sure if you intend to unfreeze text encoder, but if not, you could cache the embedding outputs of the text encoder as well. tokenize them, encode them, then cache those values same as above. Obviously means you cannot train the text encoder at all.

Extra flag in your zero_grad (optimizer.zero_grad(set_to_none=True)) will save a small amount of vram assuming you're on a new enough version of Torch.

victorchall avatar Feb 12 '23 19:02 victorchall

HuggingFace Diffusers ControlNet training script https://huggingface.co/docs/diffusers/training/controlnet uses different performance optimizations min 8GB on Linux with DeepSpeed (which is only available on Linux right now), and 12GB on Windows

geroldmeisinger avatar Sep 17 '23 10:09 geroldmeisinger