mergekit
mergekit copied to clipboard
Merging models from different pre-trained backbone
Hi @cg123, would it be feasible to merge models from different pre-trained backbone? For example, can we merge a model fine-tuned on Mistral-7b with a model fine-tuned on Llama-2-7b? Or even merge a model fine-tuned on Mistral-7b with a model fine-tuned on Llama-2-13b?
If yes, which kind of merging method can we use? Like passthrough, slerp or dare_ties?
My assumption is, even if we can merge them, the performance might not be ideal cause they highly likely doesn't lie on same error landscape.
Appreciate if you could help to clarify :).
i also want to know
There aren't currently any methods that can do this implemented in mergekit. It is a priority to explore things that would allow this - there are a few techniques I've identified as hopeful approaches but they need some adapting to work with transformer-based language models.
So no, not yet! But I'm working on it. :)
Thanks for your clarification:)!