FlexGen icon indicating copy to clipboard operation
FlexGen copied to clipboard

Support for MoE models (see Switch Tranformer, NLLB)

Open fiqas opened this issue 1 year ago • 0 comments

Hi, have you guys considered adding a support for Mixture-of-Experts models? They're usually quite hefty in terms of size and would be a great opportunity to have them offload parameters to CPU.

Examples: Switch Transformers (https://huggingface.co/google/switch-base-256) NLLB (https://github.com/facebookresearch/fairseq/tree/nllb/)

fiqas avatar Apr 18 '23 13:04 fiqas