exo Support Mixture of Expert (MoE) Models

Jul 18 '24 22:07 AlexCheema

I looked at this yesterday, would be great exo can support the deepseek v2, it should be very similar to the llama sharding in the DeepseekV2DecoderLayer. But maybe worth trying model parallelism -> https://github.com/ml-explore/mlx-examples/pull/890

Jul 19 '24 00:07 mzbac

will like to work on this :)

Jul 19 '24 09:07 345ishaan

will like to work on this :)

@345ishaan that would be great - go for it

Jul 21 '24 01:07 AlexCheema

Indeed, MoE is the most suitable application scenario for exo and should be prioritized for implementation. Really looking forward to it

Jul 23 '24 01:07 mintisan

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

Aug 01 '24 11:08 youmego

looking forward to support MoE deepseek v2 total:236B active:21B +--------------------------+---------------+---------------------+-----------------+-------------------+ | Model | #Total Params | #Activated Params | Context Length | Download | +--------------------------+---------------+---------------------+-----------------+-------------------+ | DeepSeek-V2 | 236B | 21B | 128k | 🤗 HuggingFace | | DeepSeek-V2-Chat (RL) | 236B | 21B | 128k | 🤗 HuggingFace | +--------------------------+---------------+------------------ -+-----------------+------------------+

yeah i was planning to experiment the setup with https://github.com/deepseek-ai/DeepSeek-Coder-V2 . Will be looking into it this weekend.

Aug 02 '24 03:08 345ishaan

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

Dec 26 '24 07:12 ChaseKolozsy

@AlexCheema

Is it possible to have the active parameters favor an Nvidia Cuda GPU and let the other nodes store the inactive parameters?

Yes! We can definitely do something like this. We are moving towards more of a general distributed AI framework that will enable things like this.

Dec 26 '24 17:12 AlexCheema