mamba icon indicating copy to clipboard operation
mamba copied to clipboard

How to weights merge?

Open junphine opened this issue 1 year ago • 5 comments
trafficstars

I trained 3 models, but after averaging the weights, the model output is garbled!

junphine avatar Jan 29 '24 11:01 junphine

This is an unexplored research direction, I'm not sure what the best practices are here.

albertfgu avatar Jan 29 '24 22:01 albertfgu

1706603511810 In my experiments, averaging the weights seems to speed up training.

junphine avatar Jan 30 '24 08:01 junphine

red is megered model. cyan is one of the there models.
LR is same. But the training data is new to the cyan model and previously visible to the red.

junphine avatar Jan 30 '24 08:01 junphine

Sorry, I'm lacking a lot of context here and am not sure how to help!

albertfgu avatar Jan 30 '24 16:01 albertfgu

I don't have experience with model merging, keeping this issue open in case there are others who can help.

tridao avatar Jan 30 '24 18:01 tridao