tutel
tutel copied to clipboard
Examples integrated with Megatron-LM
Could you provide an example integrated with Megatron-Lm. Thanks :)
Hello, Megatron-LM already includes a non-dynamic component that supports several MoE functionalities. However, since Megatron's expert parameter placement is static, and coupled with a set of Megatron’s predefined static parallelism configuration, adding another MoE implementation into Megatron leads to breaking of those settings as well as parameter placement conflicts which are required by Tutel (e.g. switching Top-k / parallelism from time to time).