Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] Encoder with more TP than the decoder

Open MlWoo opened this issue 4 months ago • 0 comments

A new model with a heavy module with a light module could be viewd as a t5 model. so the tp of encoder is more than that of the decoder. The tp partition is not allowed in code https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L519. if it is modified, and change the line https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L616 to for x, y in zip(e_ranks, cycle(d_ranks), is it OK for the model, what else should I considerate?

MlWoo avatar Oct 06 '24 01:10 MlWoo