ZhangEnmao
ZhangEnmao
I have the same question. And I am trying to rewrite the mixtral_moe.py file.
I searched online, and it seems that the issue may be due to the large size of the model you're loading, and your memory may not be sufficient to support...
 in mixtral_moe.py, bro
Hi, Sorry to bother you again. Could you tell me why mixtral-moe only choose llama structure or mixtral structure ? Why are other models inappropriate
Oh, you are truly amazing! Your answer has been of great help to me, and I feel like I have gained a deeper understanding of MergeKit and MOE. If you...
Hey, bro. Good morning ! I have an idea now which is a Qwen-moe.py file may be necessary, just like Qwen model owing its Qwen.py file to help loading pretrained...
> I met the same problem when I try to merge Deepseek llama model into Mixtral. https://huggingface.co/deepseek-ai/deepseek-llm-7b-base/tree/main It seems that some tensor key_name are not supported in merge-kit. We can...
Hi, I want to know how to install this MoE-structure to pretrain my model. I have done nothing about installation right now. Do I follow the steps in "https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe", and...
Hi, when I run your code, I got two errors. Could you help me and give some advises ?  
Hi, when I set target-tensor-parallel-size > 1 , I got the following errors. only setting target-tensor-parallel-size = 1 works. Is it possible that it is related to the following warning...