DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] ZERO3

Open guozhiyao opened this issue 1 year ago • 0 comments

Describe the bug I trained the gpt 13B model and used zero3, but it seems that the gpu usage will not decrease as the number of gpus increases? In addition, I enabled offload, but it shows CPU Virtual Memory: used = 0.0 GB, percent = 0.0%. Could it be related to I enabled the activation_checkpointing?

"zero_optimization": {
     "stage": 3,
     "overlap_comm": true,
     "contiguous_gradients": true,
     "offload_param": {
       "device": "cpu",
       "pin_memory": false
     },
     "offload_optimizer": {
       "device": "cpu",
       "pin_memory": false
     }
   }

guozhiyao avatar Mar 14 '23 14:03 guozhiyao