Liger-Kernel
Liger-Kernel copied to clipboard
MoE kernel
🚀 The feature, motivation and pitch
Currently the most popular library might be https://github.com/databricks/megablocks. Would be interesting if we can implement it in triton and make it HF compatible
Alternatives
No response
Additional context
No response
Will do a research on this more, if anyone has any insights on what could/should be implemented, resp. details on to how, cc me.
Maybe a preliminary would be to support for example mixtral/nllb_moe from huggingface, to have the integration ready when the layers are done?
@S1ro1 one straightforward idea is to parallelize expert forward (just like what megablock impl does). Right now in HF model code the MoE block is performed sequentiallyexpert-by-expert. Not sure if it's worth implementing load balancing loss too, haven't seen an actual profiling trace of MoE model training
@yundai424 Haven't seen one either, gonna try patching either Mixtral or Nllb with our kernels and profile it, will decide on what to do after that I guess. Implementing dMoE (dropless MoE) could also be interesting. Will try to send the profiler benchmarks tomorrow and could discuss more in depth. Also I suppose Mixtral > Nllb.
Edit: to address your comment, parallelizing the experts is certainly a low hanging fruit
@yundai424 @S1ro1 I'd like to help with this, but wanted to pin down some of the exact steps that can be taken to make the MoE layer more efficient.
Per my understanding the HF implementation of Mixtral, calls the experts sequentially because each expert can be allocated a variable number of tokens and they wanted to avoid dropping any tokens.
I guess we can start off by implementing ParallelMLPcode in this repo, but I'm not sure if this'll actually include any new triton/liger kernels. Most of the logic there seems to deal with sharding and distributing the required tensors across ranks.
@pramodith I totally agree with starting with the MLP, however i'm currently surprisingly swamped with school so I won't have time to collaborate on this. So feel free to take this.