imagen-pytorch icon indicating copy to clipboard operation
imagen-pytorch copied to clipboard

GPU memory increases with the number of GPUs used

Open DuanXiaoyue-LittleMoon opened this issue 1 year ago • 0 comments

I'm training the base u-net using the 'accelerate' command provided in the repo (i.e., 'accelerate launch train.py')

I make sure that the batchsize of each GPU is 1. It is expected that no matter how many GPUs I use, as long as I make sure the batchsize of each GPU is 1, the memory usage of each GPU should be roughly the same.

However, I find that the more GPUs I use, the larger the memory usage of each GPU is, though I make sure that the batchsize of each GPU is 1.

For example, when I train the base u-net on a single GPU, the memory usage is: [0] 19876 / 32510 MB

When I train it with 2 GPUs, the memory usage is: [0] 23892 / 32510 MB [1] 23732 / 32510 MB

When I train it with 3 GPUs, the memory usage is: [0] 25132 / 32510 MB [1] 24962 / 32510 MB [2] 24962 / 32510 MB

When I train it with 8 GPUs, the memory usage is: [0] 31176 / 32510 MB [1] 31000 / 32510 MB [2] 30930 / 32510 MB [3] 30958 / 32510 MB [4] 30940 / 32510 MB [5] 30996 / 32510 MB [6] 31070 / 32510 MB [7] 30994 / 32510 MB

It would be greatly appreciated if someone could tell me why this is the case.

DuanXiaoyue-LittleMoon avatar Nov 07 '23 09:11 DuanXiaoyue-LittleMoon