Megatron-LM [QUESTION] Encoder with more TP than the decoder

[QUESTION] Encoder with more TP than the decoder

Open MlWoo opened this issue 4 months ago • 0 comments

A new model with a heavy module with a light module could be viewd as a t5 model. so the tp of encoder is more than that of the decoder. The tp partition is not allowed in code https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L519. if it is modified, and change the line https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/parallel_state.py#L616 to for x, y in zip(e_ranks, cycle(d_ranks), is it OK for the model, what else should I considerate?

Oct 06 '24 01:10 MlWoo

Megatron-LM Megatron-LM copied to clipboard

[QUESTION] Encoder with more TP than the decoder

Megatron-LM
Megatron-LM copied to clipboard