Failure to merge models
I'm getting an issue when trying to merge two models. It fails during the merging using the LazyMerge notebook. https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing
Here is my config file.
MODEL_NAME = "rapid-cycling"
yaml_config = """
slices:
- sources:
- model: AI-Sweden-Models/gpt-sw3-1.3b-instruct
layer_range: [0, 32]
- model: mlabonne/NeuralHermes-2.5-Mistral-7B
layer_range: [0, 32]
merge_method: slerp
base_model: AI-Sweden-Models/gpt-sw3-1.3b-instruct
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
"""
Traceback (most recent call last):
File "/usr/local/bin/mergekit-yaml", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/content/mergekit/mergekit/options.py", line 76, in wrapper
f(*args, **kwargs)
File "/content/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
run_merge(
File "/content/mergekit/mergekit/merge.py", line 77, in run_merge
exec = Executor(
File "/content/mergekit/mergekit/graph.py", line 146, in __init__
self.schedule = self._make_schedule(tasks)
File "/content/mergekit/mergekit/graph.py", line 244, in _make_schedule
res = [
File "/content/mergekit/mergekit/graph.py", line 244, in <listcomp>
res = [
File "/usr/local/lib/python3.10/dist-packages/networkx/algorithms/dag.py", line 426, in lexicographical_topological_sort
zero_indegree = [create_tuple(v) for v, d in G.in_degree() if d == 0]
File "/usr/local/lib/python3.10/dist-packages/networkx/algorithms/dag.py", line 426, in <listcomp>
zero_indegree = [create_tuple(v) for v, d in G.in_degree() if d == 0]
File "/usr/local/lib/python3.10/dist-packages/networkx/algorithms/dag.py", line 422, in create_tuple
return key(node), nodeid_map[node], node
File "/content/mergekit/mergekit/graph.py", line 239, in _compare_key
task.group_label() or "",
File "/content/mergekit/mergekit/io/tasks.py", line 70, in group_label
shard_path = loader.index.tensor_paths[self.tensor]
KeyError: 'lm_head.weight'
Here is a link to the different models and their model configs https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b-instruct/tree/main https://huggingface.co/AI-Sweden-Models/gpt-sw3-1.3b-instruct/blob/main/config.json
This one is GPT2LMHeadModel which should be supported.
https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B/blob/main/config.json
GPT2LMHeadModel is supported, but it can't be merged with Mistral-based models unfortunately. Cross-architecture merging like this is an area of research I'm looking at but it's not something implemented in mergekit yet.
@cg123 is there any way I can help?