Add megablocks support for MLP MoE

Open Spico197 opened this issue 1 year ago • 0 comments

What's New

Add megablocks support for MLP MoE. Dumping & Reloading test is passed by observing the continuous loss decline. But further downstream metrics are not tested. Please use this with caution.

Conversion from the dense LLaMA model: smoe/utils/expert_construction/convert_llama_to_mixtral_mb.py
Add moe_type="megablocks" support for smoe/models/mixtral/modeling_mixtral.py

Performance Test

Experiments are conducted on 4*A100 GPUs with parameters converted from LLaMA-3-8B (8 experts, top-2).
The dataset is composed of 50 samples from OpenHermes-2.5.
bsz=2, grad accum=4, seq len=4096

Setting	Tokens/GPU/Second
w/o MegaBlocks	13485
w/ MegaBlocks	19051

Dec 07 '24 18:12 Spico197