megablocks icon indicating copy to clipboard operation
megablocks copied to clipboard

Implement Mixture of Depth and Experts (MoDE)

Open casper-hansen opened this issue 10 months ago • 2 comments

Given that MegaBlocks is highly optimized for sparse MoE models like Mixtral, I am requesting support for a variant recently termed as MoDE by Google DeepMind. Benefits include much faster training and inference due to increased sparsity.

Paper: https://arxiv.org/abs/2404.02258

I found two implementations:

  • https://github.com/epfml/llm-baselines/blob/mixture_of_depth/src/models/mod.py
  • https://github.com/kyegomez/Mixture-of-Depths/blob/main/mixture_of_depths/main.py

casper-hansen avatar Apr 05 '24 12:04 casper-hansen