ZhangEnmao comments

Results 13 comments of


                                            ZhangEnmao

#feature request# MoE structure activate expert number selection

I have the same question. And I am trying to rewrite the mixtral_moe.py file.

Keep getting an error when making a mixtral model

I searched online, and it seems that the issue may be due to the large size of the model you're loading, and your memory may not be sufficient to support...

Keep getting an error when making a mixtral model

![image](https://github.com/cg123/mergekit/assets/53638291/4a9a2a1a-d552-4ede-b761-5b3aee51f045) in mixtral_moe.py, bro

Try to add Qwen-moe into mixtral_moe.py

Hi, Sorry to bother you again. Could you tell me why mixtral-moe only choose llama structure or mixtral structure ? Why are other models inappropriate

Try to add Qwen-moe into mixtral_moe.py

Oh, you are truly amazing! Your answer has been of great help to me, and I feel like I have gained a deeper understanding of MergeKit and MOE. If you...

Try to add Qwen-moe into mixtral_moe.py

Hey, bro. Good morning ! I have an idea now which is a Qwen-moe.py file may be necessary, just like Qwen model owing its Qwen.py file to help loading pretrained...

KeyError: 'model.embed_tokens.weight' when using mergekit-moe

> I met the same problem when I try to merge Deepseek llama model into Mixtral. https://huggingface.co/deepseek-ai/deepseek-llm-7b-base/tree/main It seems that some tensor key_name are not supported in merge-kit. We can...

[BUG]: ZERO DDP error: the synchronization of gradients doesn't exit properly when train Mixtral-8x7B-v0.1 with GeminiPlugin or HybridParallelPlugin

Hi， I want to know how to install this MoE-structure to pretrain my model. I have done nothing about installation right now. Do I follow the steps in "https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe", and...

Support Mixtral 8*7B MOE

Hi, when I run your code, I got two errors. Could you help me and give some advises ? ![image](https://github.com/NVIDIA/Megatron-LM/assets/53638291/1a804bb2-467f-42e0-b332-54b633a894a2) ![image](https://github.com/NVIDIA/Megatron-LM/assets/53638291/ad86e0e6-a310-40de-94c0-8dc1d87904ed)

Support Mixtral 8*7B MOE

Hi, when I set target-tensor-parallel-size > 1 , I got the following errors. only setting target-tensor-parallel-size = 1 works. Is it possible that it is related to the following warning...