ControlNet
ControlNet copied to clipboard
Why `pl.Trainer` can not handle multi-gpu case?
I can run the original tutorial_train.py with single 3090Ti GPU (24G) with batch_size 3.
However, when upgrade to 2 or more gpus, it keep warning OOM.
trainer = pl.Trainer(gpus=2 precision=32, callbacks=[logger])
I am curious why? Why single GPU can handle batch 3 while multi-GPU can only handle 1?? The GPUS hold batches on their own parallelly, am I right?
because one gpu need to compute "gradient = (gradient_from_gpu_1 + gradient_from_gpu_2) / 2" This computation will take many vram.
gradient_from_gpu_1 + gradient_from_gpu_2
Thanks @lllyasviel !
So basically bottleneck is the one holding gradient averaging, while remaining should work fine, e.g., GPU0 require 24G+24G; GPU1 require 24G; GPU2 require 24G; GPU3 require 24G.
THUS, we should ensure GPU0 some space, e.g., work with GPU0 12G+12G, GPU1 12G, GPU2 12G, GPU3 12G.
Sry I got question again.
I travel from recognition community. In recognition, normally the multi-GPU training won't result in significant different RAMs among GPUs. Does this "1-big-gpu" thing only happen in stable diffusion/control net?
use fsdp or deepspeed training strategy
HuggingFace Diffusers ControlNet training script https://huggingface.co/docs/diffusers/training/controlnet has different optimizations builtin
all duplicates concerning "Multi GPU" https://github.com/lllyasviel/ControlNet/issues/148 https://github.com/lllyasviel/ControlNet/issues/314 https://github.com/lllyasviel/ControlNet/issues/319 https://github.com/lllyasviel/ControlNet/issues/507