mergekit issues

Adds a way of merging models with different sizes(B)

7

I'm adding the possibility of merging models with different amounts of parameters(Bs) which have the same amount of layers, through task arithmetic. I kinda hardcoded generalized task arithmetic to make...

Ar57m

Phi 2 Tensor Names Changed

FYI the tensor names changed in Phi 2

fakerybakery

Add JAISLMHeadModel

cg123

JapaneseStableLMAlphaForCausalLM support

cg123

Can this be used for classical ensembling? Intersection of tokenizers?

4

Effectively I want to run multiple models unaltered right up to the final softmax layer and then take a weighted sum of the of the pre-softmax inputs. This is mathematically...

jukofyork

After merging the Qwen model, the model failed to load due to missing files

1

Excuse me, I have a few questions to ask,and I am loking forward to your answer: I use passthrough and slerp to merge qwen14B, here is my passthrough yaml: ```yaml...

ArcherShirou

Runaway memory usage, even with lazy unpickle

4

I was successfully able to create three 103b stacked merges from three 70b models each. Now I'm trying to do a linear merge between those three 103b stacks. I had...

candre23

#feature request# MoE structure activate expert number selection

1

Thanks for your wonderful job. Current mergekit-moe support merge experts and activate 2 of them. Can we change the number of activated experts ? such as activate 4 experts ?

Xingxiangrui

mixtral branch: dimention mismatch in `cheap_embed`

Here, the dimention in `cheap_embed` is 4-dimentional tensors: https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L87 However, the `gate_vec` receive a 3-dimentional tensor. https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L158-L161

Spico197

Mixtral branch: What option should I choose when I want to do some finetuning after the merge?

5

The parameter description of "hidden" and "random" does not exactly explain what to do when I want to finetune later. Is it even useful (possible) to finetune after merging with...

PhilipMay

mergekit
mergekit copied to clipboard

Metadata

Adds a way of merging models with different sizes(B)

Phi 2 Tensor Names Changed

Add JAISLMHeadModel

JapaneseStableLMAlphaForCausalLM support

Can this be used for classical ensembling? Intersection of tokenizers?

After merging the Qwen model, the model failed to load due to missing files

Runaway memory usage, even with lazy unpickle

#feature request# MoE structure activate expert number selection

mixtral branch: dimention mismatch in `cheap_embed`

Mixtral branch: What option should I choose when I want to do some finetuning after the merge?

← Metadata

Owner

Metadata

mergekit mergekit copied to clipboard

Metadata

← Metadata

Owner

Metadata

mergekit
mergekit copied to clipboard