Megatron-LM [QUESTION] why pipeline-model-parallel size should be greater than 2 with interleaved schedule ？

[QUESTION] why pipeline-model-parallel size should be greater than 2 with interleaved schedule ？

Open nullnonenilNULL opened this issue 10 months ago • 4 comments

Your question Ask a clear and concise question about Megatron-LM.

Mar 25 '24 09:03 nullnonenilNULL

You can't use interleaved schedule without pipeline parallel

Mar 28 '24 17:03 ethanhe42

@ethanhe42 I wonder whether pipeline_model_parallel_size == 2 can be accepted?

Mar 31 '24 13:03 yuantailing

@ethanhe42 I wonder whether pipeline_model_parallel_size == 2 can be accepted?

@ethanhe42 same question.

Apr 02 '24 05:04 nullnonenilNULL

I think that pipeline_model_parallel_size == 2 can be accepted in practice but maybe with less or no benefits in reducing bubble ?

Apr 07 '24 05:04 robotsp

Marking as stale. No activity in 60 days.

Jun 06 '24 18:06 github-actions[bot]

It is because tensor_send_next and tensor_send_prev here are indistinguishable with PP=2: https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/pipeline_parallel/p2p_communication.py#L586.

This is a non-issue with overlap_p2p_comm since we split forward and backward communication in steady state. We fixed this here: https://github.com/NVIDIA/Megatron-LM/commit/152c562067cc0de6cbc8fba2a5095208f30d10cd.

Going to mark this as closed, feel free to re-open if you have additional questions.

Jun 06 '24 19:06 deepakn94

Megatron-LM Megatron-LM copied to clipboard

[QUESTION] why pipeline-model-parallel size should be greater than 2 with interleaved schedule ？

Megatron-LM
Megatron-LM copied to clipboard