Zhengxiong Luo comments

Repositories
Issues
Comments

Results 11 comments of


Zhengxiong Luo

[BUG] DeepSpeed allocates GPU memory in an unbalanced way.

The same problem. I am using zero3 to train a transformer with multi-nodes. On each node, deepspeed allocates much larger memory to the GPU with local_rank=0.