mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

NuSLERP

Open cg123 opened this issue 7 months ago • 1 comments

Adds a new merge method nuslerp. This method allows for a superset of the functionality of slerp. If provided with a base model nuslerp will perform spherical interpolation of the task vectors. While the original slerp always flattens weight tensors into a single dimension nuslerp can also do row-wise and column-wise interpolation of tensors.

This method remedies one of my long-standing gripes with how I implemented slerp. Instead of taking a t parameter and using base_model to specify which is the "first" model, nuslerp simply takes a weight parameter for each model and computes the interpolation factor t internally. This makes it fit the conventions of the other merge methods much better. The weight parameter behaves in the same fashion as it does for merge_method: linear with normalize: true.

The idea to add task vector SLERP is inspired by DeepMind's great use of it in their WARP paper.

cg123 avatar Jun 26 '24 04:06 cg123