mergekit
mergekit copied to clipboard
NuSLERP
Adds a new merge method nuslerp
. This method allows for a superset of the functionality of slerp
. If provided with a base model nuslerp
will perform spherical interpolation of the task vectors. While the original slerp
always flattens weight tensors into a single dimension nuslerp
can also do row-wise and column-wise interpolation of tensors.
This method remedies one of my long-standing gripes with how I implemented slerp
. Instead of taking a t
parameter and using base_model
to specify which is the "first" model, nuslerp
simply takes a weight
parameter for each model and computes the interpolation factor t
internally. This makes it fit the conventions of the other merge methods much better. The weight
parameter behaves in the same fashion as it does for merge_method: linear
with normalize: true
.
The idea to add task vector SLERP is inspired by DeepMind's great use of it in their WARP paper.