Pangu-Weather icon indicating copy to clipboard operation
Pangu-Weather copied to clipboard

gpu and memory for training

Open 767160 opened this issue 1 year ago • 3 comments

Hi From the paper, you used nvidia V100 GPU for training. Was it 16GB or 32GB V100? What is the memory footprint of the model in your implementation? Did you make use of NVLINK connection or was PCIe sufficient? Thank you

767160 avatar Aug 08 '23 12:08 767160

Hi,

  1. The V100 chips we used have 32GB memory.
  2. During the training, around 25GB memory is occupied on each card (batch size is 1).
  3. We used the default setting, PCIe.

Best

198808xc avatar Aug 09 '23 15:08 198808xc

Hi, Is the 25GB memory the result after using a gradient checkpoint ?

ouyangergou avatar Aug 15 '23 07:08 ouyangergou

Hello, I'm also interested to know if you used specific strategy for the training to fit in 32GB. With a straightforward implementation of the pseudocode, the training doesn't fit in 80GB GPU. Could you give us some implementation tips ?

mikael10j avatar Feb 26 '24 15:02 mikael10j