ZhangEnmao

Results 13 comments of ZhangEnmao

I have the same question. And I am trying to rewrite the mixtral_moe.py file.

I searched online, and it seems that the issue may be due to the large size of the model you're loading, and your memory may not be sufficient to support...

![image](https://github.com/cg123/mergekit/assets/53638291/4a9a2a1a-d552-4ede-b761-5b3aee51f045) in mixtral_moe.py, bro

Hi, Sorry to bother you again. Could you tell me why mixtral-moe only choose llama structure or mixtral structure ? Why are other models inappropriate

Oh, you are truly amazing! Your answer has been of great help to me, and I feel like I have gained a deeper understanding of MergeKit and MOE. If you...

Hey, bro. Good morning ! I have an idea now which is a Qwen-moe.py file may be necessary, just like Qwen model owing its Qwen.py file to help loading pretrained...

> I met the same problem when I try to merge Deepseek llama model into Mixtral. https://huggingface.co/deepseek-ai/deepseek-llm-7b-base/tree/main It seems that some tensor key_name are not supported in merge-kit. We can...

Hi, I want to know how to install this MoE-structure to pretrain my model. I have done nothing about installation right now. Do I follow the steps in "https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe", and...

Hi, when I run your code, I got two errors. Could you help me and give some advises ? ![image](https://github.com/NVIDIA/Megatron-LM/assets/53638291/1a804bb2-467f-42e0-b332-54b633a894a2) ![image](https://github.com/NVIDIA/Megatron-LM/assets/53638291/ad86e0e6-a310-40de-94c0-8dc1d87904ed)

Hi, when I set target-tensor-parallel-size > 1 , I got the following errors. only setting target-tensor-parallel-size = 1 works. Is it possible that it is related to the following warning...