fastmoe
fastmoe copied to clipboard
Example to run Megatron
Hi,
Thanks for the exciting work!! I want to use the parallel methods when running Megatron, but seems there isn't an example/script to run Megatron and I cannot find a main function. Could you please share the example to run Megatron based on different parallel methods (e.g., data and model paralle)? Thanks!
To run FastMoE with Megatron, you are supposed to use Megatron's main function, e.g. pretrain_gpt.py
, with FastMoE's patch applied.
Thanks for your response!
Which patch I should use if I want to enable the expert parallel? Thanks!
You should use the patch that matches your Megatron version. The key operation to enable moe is adding --fmoefy
argument to the pretrain_xxx.py