TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

[Question] Why Tensor parallel communication/GEMM overlap can happen only when sequence parallelism is enabled?

Open hxdtest opened this issue 1 year ago • 2 comments

In Megatron, I find that the check for tp_comm_overlap and sequence_parallel

if args.tp_comm_overlap:         
        assert args.sequence_parallel == True, 'Tensor parallel communication/GEMM overlap can happen only when sequence parallelism is enabled'

But why?

hxdtest avatar Apr 03 '24 09:04 hxdtest

That is because we currently only support AllGather/ReduceScatter overlapping with GEMM (and those communication types are used when sequence parallelism is enabled, as opposed to AllReduce which is being used in the other cases).

ptrendx avatar Apr 09 '24 20:04 ptrendx