mergekit
mergekit copied to clipboard
Merging models with different structures in linear
trafficstars
When merging models with different structures in linear, the following error occurred I understand that errors can occur, but is there a way to skip specific layers where the error occurs?
Traceback (most recent call last):
File "/usr/local/bin/mergekit-yaml", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/content/mergekit/mergekit/options.py", line 82, in wrapper
f(*args, **kwargs)
File "/content/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
run_merge(
File "/content/mergekit/mergekit/merge.py", line 92, in run_merge
for _task, value in exec.run(quiet=options.quiet):
File "/content/mergekit/mergekit/graph.py", line 197, in run
res = task.execute(**arguments)
File "/content/mergekit/mergekit/merge_methods/linear.py", line 52, in execute
raise RuntimeError(
RuntimeError: Tensor size mismatch for model.layers.0.mlp.down_proj.weight, sizes: [torch.Size([4096, 14336]), torch.Size([4096, 11008])]
This currently isn't supported. Models have to have the same interior dimensions (hidden size & immediate size) to be merged.
Merging models with different sizes like this is an active area of research though. There are a few things we're trying internally and I'm hopeful one will pan out.
I see. I understand. Thank you. I look forward to seeing this repository evolve in the future!