Megatron-LM use _all_gather_base instead of all

use _all_gather_base instead of all_gather

Open taozhiwei opened this issue 1 year ago • 1 comments

when using allgather, the output is a list, and in the implementation of torch, the list will be flattened and unflattened, which will result in additional allocation of GPU memory and D2D operations. But these all gather operations already have a flat GPU memory, using _all_gather_base replaces all_gather will save GPU memory allocation and additional D2D operations.

Dec 05 '23 07:12 taozhiwei

Marking as stale. No activity in 60 days.

May 04 '24 18:05 github-actions[bot]

Megatron-LM Megatron-LM copied to clipboard

use _all_gather_base instead of all_gather

Megatron-LM
Megatron-LM copied to clipboard