mergekit
mergekit copied to clipboard
Tools for merging pretrained large language models.
Hi! Thanks for your great work! I have two questions. (1) When I use the following setting ``` models: - model: /data2/model/Quantize/llama2-chat_normal parameters: weight: 0.1 - model: /data2/model/Quantize/llama2-chat_normal parameters: weight:...
## Problem In a Mixture of Experts (MoE) LLM, the gating network outputs a categorical distribution of $n$ values (chosen from $n_{max}$), which is then used to create a convex...
Has anyone tried downscaling the K and/or Q matrices for repeated layers in franken-merges? This should act like changing the temperature of the softmax and effectively smooth the distribution: **Hopfield...
If one trains at a context window of 8K can one merge with a model of same architecture with longer context window? Say train https://huggingface.co/meta-llama/Meta-Llama-3-8B trained merged into https://huggingface.co/NurtureAI/Meta-Llama-3-8B-Instruct-64k
I'm trying to merge some embedding models with this config file. the architectures are similar but I think it is erroring out on some names of layers? Would love some...
```bash python dump_out.py gpt2 -o dump_output --dump-type hidden-state -d metric-space/experiment_med -s 2 -c question -u part1 ``` ```bash python dump_out.py gpt2 -o dump_output --dump-type activation -d metric-space/experiment_med -s 2 -c...
Thanks for this amazing work. It makes everything easier to merge models. I read this paper recently and the proposed method, Variation Ratio Merge (VARM), is also a novel merge...
As below .ipynb code you provided, where can I specify a GPU to let the merging process run on ? https://github.com/arcee-ai/mergekit/blob/main/notebook.ipynb
Happy to see that you have implemented evolutionary merging. I tried to follow your tutorial: https://blog.arcee.ai/tutorial-tutorial-how-to-get-started-with-evolutionary-model-merging/ The installation example causes an error. I almost gave up but then I found...
Hi @cg123, Great library, thanks a lot, super useful! I've finetuned GPT2 on 2 tasks (model1 and model2) and am trying to merge using your repo. It turns out, using...