mergekit
mergekit copied to clipboard
Tools for merging pretrained large language models.
I'm adding the possibility of merging models with different amounts of parameters(Bs) which have the same amount of layers, through task arithmetic. I kinda hardcoded generalized task arithmetic to make...
FYI the tensor names changed in Phi 2
Effectively I want to run multiple models unaltered right up to the final softmax layer and then take a weighted sum of the of the pre-softmax inputs. This is mathematically...
Excuse me, I have a few questions to ask,and I am loking forward to your answer: I use passthrough and slerp to merge qwen14B, here is my passthrough yaml: ```yaml...
I was successfully able to create three 103b stacked merges from three 70b models each. Now I'm trying to do a linear merge between those three 103b stacks. I had...
Thanks for your wonderful job. Current mergekit-moe support merge experts and activate 2 of them. Can we change the number of activated experts ? such as activate 4 experts ?
Here, the dimention in `cheap_embed` is 4-dimentional tensors: https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L87 However, the `gate_vec` receive a 3-dimentional tensor. https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L158-L161
The parameter description of "hidden" and "random" does not exactly explain what to do when I want to finetune later. Is it even useful (possible) to finetune after merging with...