mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Merging models from different pre-trained backbone

Open ReezDDDD opened this issue 1 year ago • 1 comments

Hi @cg123, would it be feasible to merge models from different pre-trained backbone? For example, can we merge a model fine-tuned on Mistral-7b with a model fine-tuned on Llama-2-7b? Or even merge a model fine-tuned on Mistral-7b with a model fine-tuned on Llama-2-13b?

If yes, which kind of merging method can we use? Like passthrough, slerp or dare_ties?

My assumption is, even if we can merge them, the performance might not be ideal cause they highly likely doesn't lie on same error landscape.

Appreciate if you could help to clarify :).

ReezDDDD avatar Jan 23 '24 06:01 ReezDDDD

i also want to know

XiaoYee avatar Jan 24 '24 13:01 XiaoYee

There aren't currently any methods that can do this implemented in mergekit. It is a priority to explore things that would allow this - there are a few techniques I've identified as hopeful approaches but they need some adapting to work with transformer-based language models.

So no, not yet! But I'm working on it. :)

cg123 avatar Jan 25 '24 04:01 cg123

Thanks for your clarification:)!

ReezDDDD avatar Jan 25 '24 04:01 ReezDDDD