mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

Open win10ogod opened this issue 1 year ago • 2 comments

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

win10ogod avatar Dec 18 '23 00:12 win10ogod

@win10ogod I also have a similar question. When we are using the pass-through method, is there any logical way where we can select layers from each model? Can we use something like task arithmetic values to pick the most useful layers?

shamanez avatar Dec 18 '23 00:12 shamanez

I am pretty sure you are asking the same question as what I am looking at (if not sorry to hijack this post). I read the paper Model soup mentioned by @cg123 here https://arxiv.org/pdf/2203.05482.pdf When reading the section 4, we see they try to compare "soups" and "Ensembling" If am not mistaken, my understanding is that Soups is well suited for models sharing the same initialization weights (seed) otherwise models take a completely different path and averaging weights is either irrelevant OR require a post training (finetuning) that may or may not be beneficial. On the other hand, Ensembling is suited for different models since it acts at logits level hence taking the best path mentioned in the title. Ensembling is de facto superior to "Soup" (as they refer in the paper). So the question is, do other methods than Linear emulate better ensembling for models that do not share the same initialization.

Am I correct ?

vince62s avatar Jan 04 '24 18:01 vince62s