mamba
mamba copied to clipboard
How to weights merge?
I trained 3 models, but after averaging the weights, the model output is garbled!
This is an unexplored research direction, I'm not sure what the best practices are here.
In my experiments, averaging the weights seems to speed up training.
red is megered model. cyan is one of the there models.
LR is same. But the training data is new to the cyan model and previously visible to the red.
Sorry, I'm lacking a lot of context here and am not sure how to help!
I don't have experience with model merging, keeping this issue open in case there are others who can help.