mergekit
mergekit copied to clipboard
Thera are still some problems with moe merge qwen with other LLM(like llama,deepseek,etc)
Here is one piece of code In the file of mergekit/mergekit/moe/qwen.py
`for model_ref in ( [config.base_model] + [e.source_model for e in config.experts] + [e.source_model for e in (config.shared_experts or [])] ): model_cfg = model_ref.config(trust_remote_code=trust_remote_code) model_types.append(model_cfg.model_type)
if len(set(model_types)) != 1:
if explain:
logging.warning(
"Qwen MoE requires all input models to have the same architecture"
)
return False
if model_types[0] not in ("llama", "mistral", "qwen2"):
print("model_types[0]",model_types[0])
if explain:
logging.warning(
"Qwen MoE requires all input models to be Qwen2, Llama or Mistral models"
)
return True
`
The question is how can I merge qwen2 with other LLM while len(set(model_types)) have equal to 1?
While I change "len(set(model_types)) != 1:" to "len(set(model_types)) != 2:", I can finally merge qwen2 with llama.
Here is my config.yaml
base_model: */models/Qwen2-7B architecture: qwen experts:
-
source_model: */models/CodeLlama-7b-hf positive_prompts:
- "code"
-
source_model: */models/CodeLlama-7b-hf positive_prompts:
- "python"
shared_experts:
- source_model: /*/models/CodeLlama-7b-hf
positive_prompts:
- "programming"
- "algorithm"
The documentation about how to merge qwen2 is too simple to use.Here are some notifications.
- Qwen MoE merge requires exactly one shared expert
- Qwen MoE requires the shared expert to have prompts
- Qwen MoE requires all input models to have the same architecture
- Qwen MoE requires all input models to be Qwen2, Llama or Mistral models
- The prompts of each expert can not be same.