gpt-fast icon indicating copy to clipboard operation
gpt-fast copied to clipboard

Expert parallelism / MoE example would be awesome :)

Open andersonbcdefg opened this issue 1 year ago • 1 comments

I loved seeing the blog post with a simple, standalone implementation of many techniques used in production to speed up LLMs. Would love to see this extended to MoE like Mixtral, which at the moment seem fairly annoying to use and hack on. Curious how torch.compile can help with these, and possible issues that might arise like graph breaks due to gating.

andersonbcdefg avatar Dec 20 '23 07:12 andersonbcdefg

@andersonbcdefg We have added the support of Mixtral-8x7B MoE, please check https://github.com/pytorch-labs/gpt-fast/pull/71. Feel free to try and share feedback.

yanboliang avatar Feb 01 '24 04:02 yanboliang