pi-GAN icon indicating copy to clipboard operation
pi-GAN copied to clipboard

Unbalanced GPU memory usage

Open skq-cuhk opened this issue 4 years ago • 3 comments

Thanks for the great work! I noticed that the GPU load is unbalanced. There are 7 additional processes on GPU0, each requires roughly 500+ Mb of GPU memory. These additional processes are triggered by self._distributed_broadcast_coalesced() in torch.DistributedDataParallel() when instantiating a DDP model. Do you have any idea about balancing the memory requirement on each GPU? Thank you.

skq-cuhk avatar Sep 14 '21 14:09 skq-cuhk

i observed the same problem.. do you have any solution?

YeonsungJung avatar Apr 25 '22 15:04 YeonsungJung

i observed the same problem.. do you have any solution?

Adding torch.cuda.set_device(rank) in the beginning of the training function might help.

KeqiangSun avatar Apr 26 '22 02:04 KeqiangSun

it works!!! god bless you

YeonsungJung avatar Apr 26 '22 04:04 YeonsungJung