Megatron-LM
Megatron-LM copied to clipboard
[QUESTION] why pipeline-model-parallel size should be greater than 2 with interleaved schedule ?
Your question Ask a clear and concise question about Megatron-LM.
You can't use interleaved schedule without pipeline parallel
@ethanhe42 I wonder whether pipeline_model_parallel_size == 2
can be accepted?
@ethanhe42 I wonder whether
pipeline_model_parallel_size == 2
can be accepted?
@ethanhe42 same question.
I think that pipeline_model_parallel_size == 2 can be accepted in practice but maybe with less or no benefits in reducing bubble ?
Marking as stale. No activity in 60 days.
It is because tensor_send_next
and tensor_send_prev
here are indistinguishable with PP=2: https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/pipeline_parallel/p2p_communication.py#L586.
This is a non-issue with overlap_p2p_comm
since we split forward and backward communication in steady state. We fixed this here: https://github.com/NVIDIA/Megatron-LM/commit/152c562067cc0de6cbc8fba2a5095208f30d10cd.
Going to mark this as closed, feel free to re-open if you have additional questions.