Zhengxiong Luo

Results 11 comments of Zhengxiong Luo

The same problem. I am using zero3 to train a transformer with multi-nodes. On each node, deepspeed allocates much larger memory to the GPU with local_rank=0.