LLaMA-MoE-v2
LLaMA-MoE-v2 copied to clipboard
Add megablocks support for MLP MoE
What's New
Add megablocks support for MLP MoE. Dumping & Reloading test is passed by observing the continuous loss decline. But further downstream metrics are not tested. Please use this with caution.
- Conversion from the dense LLaMA model:
smoe/utils/expert_construction/convert_llama_to_mixtral_mb.py - Add
moe_type="megablocks"support forsmoe/models/mixtral/modeling_mixtral.py
Performance Test
- Experiments are conducted on 4*A100 GPUs with parameters converted from
LLaMA-3-8B(8 experts, top-2). - The dataset is composed of 50 samples from
OpenHermes-2.5. - bsz=2, grad accum=4, seq len=4096
| Setting | Tokens/GPU/Second |
|---|---|
| w/o MegaBlocks | 13485 |
| w/ MegaBlocks | 19051 |