megablocks Implement Mixture of Depth and Experts (MoDE)

Implement Mixture of Depth and Experts (MoDE)

Open casper-hansen opened this issue 10 months ago • 2 comments

Given that MegaBlocks is highly optimized for sparse MoE models like Mixtral, I am requesting support for a variant recently termed as MoDE by Google DeepMind. Benefits include much faster training and inference due to increased sparsity.

Paper: https://arxiv.org/abs/2404.02258

I found two implementations:

https://github.com/epfml/llm-baselines/blob/mixture_of_depth/src/models/mod.py
https://github.com/kyegomez/Mixture-of-Depths/blob/main/mixture_of_depths/main.py

Apr 05 '24 12:04 casper-hansen

megablocks megablocks copied to clipboard

Implement Mixture of Depth and Experts (MoDE)

megablocks
megablocks copied to clipboard