mergekit issues

Wrong answer in the merged model weights

12

Hi! Thanks for your great work! I have two questions. (1) When I use the following setting ``` models: - model: /data2/model/Quantize/llama2-chat_normal parameters: weight: 0.1 - model: /data2/model/Quantize/llama2-chat_normal parameters: weight:...

tianshuocong

Idea: Scaling the Down-Projection Matrix in 'Mixture of Experts' Models

7

## Problem In a Mixture of Experts (MoE) LLM, the gating network outputs a categorical distribution of $n$ values (chosen from $n_{max}$), which is then used to create a convex...

jukofyork

Idea: Downscaling the K and/or Q matrices for repeated layers in franken-merges?

63

Has anyone tried downscaling the K and/or Q matrices for repeated layers in franken-merges? This should act like changing the temperature of the softmax and effectively smooth the distribution: **Hopfield...

jukofyork

Training at Lower Context and Merging Large

If one trains at a context window of 8K can one merge with a model of same architecture with longer context window? Say train https://huggingface.co/meta-llama/Meta-Llama-3-8B trained merged into https://huggingface.co/NurtureAI/Meta-Llama-3-8B-Instruct-64k

Jacobsolawetz

Merging fails with RuntimeError: weight required but not present in model

7

I'm trying to merge some embedding models with this config file. the architectures are similar but I think it is erroring out on some names of layers? Would love some...

w601sxs

Representation based alignment and merge

```bash python dump_out.py gpt2 -o dump_output --dump-type hidden-state -d metric-space/experiment_med -s 2 -c question -u part1 ``` ```bash python dump_out.py gpt2 -o dump_output --dump-type activation -d metric-space/experiment_med -s 2 -c...

shamanez

Request add support for Variation Ratio Merge (VARM) method

1

Thanks for this amazing work. It makes everything easier to merge models. I read this paper recently and the proposed method, Variation Ratio Merge (VARM), is also a novel merge...

tony92151

How to specify a GPU to run on, when using Notebook?

1

As below .ipynb code you provided, where can I specify a GPU to let the merging process run on ? https://github.com/arcee-ai/mergekit/blob/main/notebook.ipynb

SpeeeedLee

Typo in your evolutionary merging tutorial causes error

1

Happy to see that you have implemented evolutionary merging. I tried to follow your tutorial: https://blog.arcee.ai/tutorial-tutorial-how-to-get-started-with-evolutionary-model-merging/ The installation example causes an error. I almost gave up but then I found...

LarsEJonasson

Mergekit support for GPT2 failing

2

Hi @cg123, Great library, thanks a lot, super useful! I've finetuned GPT2 on 2 tasks (model1 and model2) and am trying to merge using your repo. It turns out, using...

NamburiSrinath

mergekit
mergekit copied to clipboard

Metadata

Wrong answer in the merged model weights

Idea: Scaling the Down-Projection Matrix in 'Mixture of Experts' Models

Idea: Downscaling the K and/or Q matrices for repeated layers in franken-merges?

Training at Lower Context and Merging Large

Merging fails with RuntimeError: weight required but not present in model

Representation based alignment and merge

Request add support for Variation Ratio Merge (VARM) method

How to specify a GPU to run on, when using Notebook?

Typo in your evolutionary merging tutorial causes error

Mergekit support for GPT2 failing

← Metadata

Owner

Metadata

mergekit mergekit copied to clipboard

Metadata

← Metadata

Owner

Metadata

mergekit
mergekit copied to clipboard