mergekit Thera are still some problems with moe merge qwen with other LLM(like llama,deepseek,etc)

Thera are still some problems with moe merge qwen with other LLM(like llama,deepseek,etc)

Open aoyinke opened this issue 7 months ago • 3 comments

Here is one piece of code In the file of mergekit/mergekit/moe/qwen.py

`for model_ref in ( [config.base_model] + [e.source_model for e in config.experts] + [e.source_model for e in (config.shared_experts or [])] ): model_cfg = model_ref.config(trust_remote_code=trust_remote_code) model_types.append(model_cfg.model_type)

    if len(set(model_types)) != 1:
        if explain:
            logging.warning(
                "Qwen MoE requires all input models to have the same architecture"
            )
        return False
    if model_types[0] not in ("llama", "mistral", "qwen2"):
        print("model_types[0]",model_types[0])
        if explain:
            logging.warning(
                "Qwen MoE requires all input models to be Qwen2, Llama or Mistral models"
            )
        return True

The question is how can I merge qwen2 with other LLM while len(set(model_types)) have equal to 1?

While I change "len(set(model_types)) != 1:" to "len(set(model_types)) != 2:", I can finally merge qwen2 with llama.

Here is my config.yaml

base_model: */models/Qwen2-7B architecture: qwen experts:

source_model: */models/CodeLlama-7b-hf positive_prompts:
- "code"
source_model: */models/CodeLlama-7b-hf positive_prompts:
- "python"

shared_experts:

source_model: /*/models/CodeLlama-7b-hf positive_prompts:
- "programming"
- "algorithm"

The documentation about how to merge qwen2 is too simple to use.Here are some notifications.

Qwen MoE merge requires exactly one shared expert
Qwen MoE requires the shared expert to have prompts
Qwen MoE requires all input models to have the same architecture
Qwen MoE requires all input models to be Qwen2, Llama or Mistral models
The prompts of each expert can not be same.

Jul 03 '24 02:07 aoyinke

mergekit mergekit copied to clipboard

Thera are still some problems with moe merge qwen with other LLM(like llama,deepseek,etc)

mergekit
mergekit copied to clipboard