gpt-fast Expert parallelism / MoE example would be awesome :)

Expert parallelism / MoE example would be awesome :)

Open andersonbcdefg opened this issue 1 year ago • 1 comments

I loved seeing the blog post with a simple, standalone implementation of many techniques used in production to speed up LLMs. Would love to see this extended to MoE like Mixtral, which at the moment seem fairly annoying to use and hack on. Curious how torch.compile can help with these, and possible issues that might arise like graph breaks due to gating.

Dec 20 '23 07:12 andersonbcdefg

@andersonbcdefg We have added the support of Mixtral-8x7B MoE, please check https://github.com/pytorch-labs/gpt-fast/pull/71. Feel free to try and share feedback.

Feb 01 '24 04:02 yanboliang

gpt-fast gpt-fast copied to clipboard

Expert parallelism / MoE example would be awesome :)

gpt-fast
gpt-fast copied to clipboard