fastmoe Example to run Megatron

Example to run Megatron

Open Juanhui28 opened this issue 1 year ago • 3 comments

Hi,

Thanks for the exciting work!! I want to use the parallel methods when running Megatron, but seems there isn't an example/script to run Megatron and I cannot find a main function. Could you please share the example to run Megatron based on different parallel methods (e.g., data and model paralle)? Thanks!

Feb 26 '24 06:02 Juanhui28

To run FastMoE with Megatron, you are supposed to use Megatron's main function, e.g. pretrain_gpt.py, with FastMoE's patch applied.

Feb 26 '24 06:02 laekov

Thanks for your response!

Which patch I should use if I want to enable the expert parallel? Thanks!

Feb 26 '24 07:02 Juanhui28

You should use the patch that matches your Megatron version. The key operation to enable moe is adding --fmoefy argument to the pretrain_xxx.py

Feb 26 '24 07:02 laekov

fastmoe fastmoe copied to clipboard

Example to run Megatron

fastmoe
fastmoe copied to clipboard