exo icon indicating copy to clipboard operation
exo copied to clipboard

Support Mixture of Expert (MoE) Models

Open AlexCheema opened this issue 1 year ago • 9 comments

IMG_0084

AlexCheema avatar Jul 18 '24 22:07 AlexCheema

IMG_0085

AlexCheema avatar Jul 18 '24 22:07 AlexCheema

I looked at this yesterday, would be great exo can support the deepseek v2, it should be very similar to the llama sharding in the DeepseekV2DecoderLayer. But maybe worth trying model parallelism -> https://github.com/ml-explore/mlx-examples/pull/890

mzbac avatar Jul 19 '24 00:07 mzbac

will like to work on this :)

345ishaan avatar Jul 19 '24 09:07 345ishaan

will like to work on this :)

@345ishaan that would be great - go for it

AlexCheema avatar Jul 21 '24 01:07 AlexCheema

Indeed, MoE is the most suitable application scenario for exo and should be prioritized for implementation. Really looking forward to it

mintisan avatar Jul 23 '24 01:07 mintisan

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

youmego avatar Aug 01 '24 11:08 youmego

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

yeah i was planning to experiment the setup with https://github.com/deepseek-ai/DeepSeek-Coder-V2 . Will be looking into it this weekend.

345ishaan avatar Aug 02 '24 03:08 345ishaan

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

ChaseKolozsy avatar Dec 26 '24 07:12 ChaseKolozsy

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

Yes! We can definitely do something like this. We are moving towards more of a general distributed AI framework that will enable things like this.

AlexCheema avatar Dec 26 '24 17:12 AlexCheema