LLaMA-MoE-v2 icon indicating copy to clipboard operation
LLaMA-MoE-v2 copied to clipboard

Add megablocks support for MLP MoE

Open Spico197 opened this issue 1 year ago • 0 comments

What's New

Add megablocks support for MLP MoE. Dumping & Reloading test is passed by observing the continuous loss decline. But further downstream metrics are not tested. Please use this with caution.

  1. Conversion from the dense LLaMA model: smoe/utils/expert_construction/convert_llama_to_mixtral_mb.py
  2. Add moe_type="megablocks" support for smoe/models/mixtral/modeling_mixtral.py

Performance Test

  • Experiments are conducted on 4*A100 GPUs with parameters converted from LLaMA-3-8B (8 experts, top-2).
  • The dataset is composed of 50 samples from OpenHermes-2.5.
  • bsz=2, grad accum=4, seq len=4096
Setting Tokens/GPU/Second
w/o MegaBlocks 13485
w/ MegaBlocks 19051

Spico197 avatar Dec 07 '24 18:12 Spico197