mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Failure to merge models

Open BirgerMoell opened this issue 1 year ago • 2 comments

I'm getting an issue when trying to merge two models. It fails during the merging using the LazyMerge notebook. https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing

Here is my config file.

MODEL_NAME = "rapid-cycling"
yaml_config = """
slices:
  - sources:
      - model: AI-Sweden-Models/gpt-sw3-1.3b-instruct
        layer_range: [0, 32]
      - model: mlabonne/NeuralHermes-2.5-Mistral-7B
        layer_range: [0, 32]
merge_method: slerp
base_model: AI-Sweden-Models/gpt-sw3-1.3b-instruct
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
"""
Traceback (most recent call last):
  File "/usr/local/bin/mergekit-yaml", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/content/mergekit/mergekit/options.py", line 76, in wrapper
    f(*args, **kwargs)
  File "/content/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
    run_merge(
  File "/content/mergekit/mergekit/merge.py", line 77, in run_merge
    exec = Executor(
  File "/content/mergekit/mergekit/graph.py", line 146, in __init__
    self.schedule = self._make_schedule(tasks)
  File "/content/mergekit/mergekit/graph.py", line 244, in _make_schedule
    res = [
  File "/content/mergekit/mergekit/graph.py", line 244, in <listcomp>
    res = [
  File "/usr/local/lib/python3.10/dist-packages/networkx/algorithms/dag.py", line 426, in lexicographical_topological_sort
    zero_indegree = [create_tuple(v) for v, d in G.in_degree() if d == 0]
  File "/usr/local/lib/python3.10/dist-packages/networkx/algorithms/dag.py", line 426, in <listcomp>
    zero_indegree = [create_tuple(v) for v, d in G.in_degree() if d == 0]
  File "/usr/local/lib/python3.10/dist-packages/networkx/algorithms/dag.py", line 422, in create_tuple
    return key(node), nodeid_map[node], node
  File "/content/mergekit/mergekit/graph.py", line 239, in _compare_key
    task.group_label() or "",
  File "/content/mergekit/mergekit/io/tasks.py", line 70, in group_label
    shard_path = loader.index.tensor_paths[self.tensor]
KeyError: 'lm_head.weight'

Here is a link to the different models and their model configs https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b-instruct/tree/main https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b-instruct/blob/main/config.json

This one is GPT2LMHeadModel which should be supported.

https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B/blob/main/config.json

BirgerMoell avatar Jan 29 '24 22:01 BirgerMoell

GPT2LMHeadModel is supported, but it can't be merged with Mistral-based models unfortunately. Cross-architecture merging like this is an area of research I'm looking at but it's not something implemented in mergekit yet.

cg123 avatar Jan 30 '24 04:01 cg123

@cg123 is there any way I can help?

BirgerMoell avatar Jan 30 '24 08:01 BirgerMoell