imagen-pytorch
imagen-pytorch copied to clipboard
GPU memory increases with the number of GPUs used
I'm training the base u-net using the 'accelerate' command provided in the repo (i.e., 'accelerate launch train.py')
I make sure that the batchsize of each GPU is 1. It is expected that no matter how many GPUs I use, as long as I make sure the batchsize of each GPU is 1, the memory usage of each GPU should be roughly the same.
However, I find that the more GPUs I use, the larger the memory usage of each GPU is, though I make sure that the batchsize of each GPU is 1.
For example, when I train the base u-net on a single GPU, the memory usage is: [0] 19876 / 32510 MB
When I train it with 2 GPUs, the memory usage is: [0] 23892 / 32510 MB [1] 23732 / 32510 MB
When I train it with 3 GPUs, the memory usage is: [0] 25132 / 32510 MB [1] 24962 / 32510 MB [2] 24962 / 32510 MB
When I train it with 8 GPUs, the memory usage is: [0] 31176 / 32510 MB [1] 31000 / 32510 MB [2] 30930 / 32510 MB [3] 30958 / 32510 MB [4] 30940 / 32510 MB [5] 30996 / 32510 MB [6] 31070 / 32510 MB [7] 30994 / 32510 MB
It would be greatly appreciated if someone could tell me why this is the case.