mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Will the resulting model size increase much after merging

Open pradeepdev-1995 opened this issue 1 year ago • 4 comments

Will the resulting model size increase much after merging Eg: First Model to merge with size M1, Second Model to merge with size M2, and Third Model to merge with size M3 So will be the final merged model with size (M1 + M2 +M3)?

pradeepdev-1995 avatar Feb 05 '24 05:02 pradeepdev-1995

In general, all of the models you merge together will need to be the same size and the output will be that same size as well. So for example if you're merging Mistral models, you'll combine however many 7B models and the output will still be 7B.

There are two exceptions to this. One is if you use the slices: configuration syntax to make a model that has more layers than your input models. (This is commonly called "frankenmerging" and is where models like Goliath or MegaDolphin-120b come from.) The other is the mergekit-moe script, which produces a pseudo-"mixture of experts" that will be approximately the sum of input sizes.

cg123 avatar Feb 08 '24 05:02 cg123

so can't we merge the same parameters in different models like mistral 7B and Openchat 7B?

pradeepdev-1995 avatar Feb 08 '24 06:02 pradeepdev-1995

You can - mistral and openchat are both using the mistral architecture and have the same number of parameters, so you can merge them. The result will also be a 7B parameter mistral-architecture model.

cg123 avatar Feb 08 '24 06:02 cg123

Great thanks. So I have 2 fine-tuned models 1 - Finetuned model in mistral 7B instruct model 2 - Finetuned model in openchat 7B model Hope using the mergekit I can merge both finetuned models and work as a single model

And both finetuned models expect different prompt formats. so how can i handle the prompt formats after making it a single model?

pradeepdev-1995 avatar Feb 08 '24 06:02 pradeepdev-1995