mergekit How to merge models of two different sizes and architectures?

Hello!

I'm trying to merge a 7B mistral with a 13B Llama-2 model. In my mind I'd like to essentially keep the 7B model the same and just expand it to 13B size by grafting on the 13B if that makes sense. Sorry, I'm new to this.

I know the results won't be ideal probably, it's just for personal experimentation. Is this possible? If so, could you point me in the right direction?

I've tried your frankenllama_22.py script and I got an error about expanding tensors too much or something along those lines. Again, sorry, I'm very new to this. Any help is appreciated.

Nov 13 '23 00:11 ddh0

Hi @cg123, I'm interested in this too. I was wondering if I could merge a Llama + Mistral model? Is this possible?

Nov 30 '23 22:11 fakerybakery

Hey, sorry for not getting to this sooner!

The reason Mistral doesn't work with the frankenllama_22 script is that it uses GQA, which breaks some assumptions I made in writing it. If you want to make this particular frankenmerge happen, then the easiest way is going to be to modify Mistral to not use GQA.

Here's a non-GQA mistral: https://huggingface.co/chargoddard/Mistral-7B-MHA And the script I used to make it is here for reference: https://gist.github.com/cg123/05e48654d04661a64978045b6aa1dcb9

I think the script should work for this plus a 13B Llama-2 model. Don't expect miracles from this type of merge though - you won't see any real difference from the base model until you do some fine tuning.

Hope this helps!

Dec 01 '23 03:12 cg123

Hi, thanks for your response. Is it possible then to merge a Llama 13b and 7B model?

Dec 01 '23 16:12 fakerybakery

I guess as long as the embedding size is the same, this is possible!

I am also interested in this. Is there a secret recipe 😄 ?

Also, I would prefer to use a part of one model.

Dec 05 '23 01:12 shamanez

Hey, sorry for not getting to this sooner!

The reason Mistral doesn't work with the frankenllama_22 script is that it uses GQA, which breaks some assumptions I made in writing it. If you want to make this particular frankenmerge happen, then the easiest way is going to be to modify Mistral to not use GQA.

Here's a non-GQA mistral: https://huggingface.co/chargoddard/Mistral-7B-MHA And the script I used to make it is here for reference: https://gist.github.com/cg123/05e48654d04661a64978045b6aa1dcb9

I think the script should work for this plus a 13B Llama-2 model. Don't expect miracles from this type of merge though - you won't see any real difference from the base model until you do some fine tuning.

Hope this helps!

Hey, how about yi and mistral? yi used GQA for both 6B and 34B. Can yi merge with mistral? Does the model use GQA can only merge with the model use GQA, and MHA merge with MHA?

Dec 28 '23 12:12 DumoeDss

mergekit mergekit copied to clipboard

How to merge models of two different sizes and architectures?

mergekit
mergekit copied to clipboard