mergekit
mergekit copied to clipboard
Tools for merging pretrained large language models.
I use mergekit-moe to generate MOE model with several same Gemma models(mode hidden), but the result model output meaningless results like in https://github.com/arcee-ai/mergekit/issues/218#issuecomment-2027402773. It didn't happened when I merged models...
Are there any plans to support fine-grained experts in the future? Fine-grained experts is a technique adopted in projects like Qwen MoE and DeepSeek MoE, and has shown promising results....
I am trying to condense a model by 1/4. I want to merge the 4 layer over the previous 3 layers, When i try this i get 0 layers on...
BitNet is now supported in many structures. A good start would be to add its support to mergekit.
Hi team, could you please tell me which merging method you used for Arcee-Spark? Thanks.
Hello! I actually have two models - CodeLLaMa-13b-Python and CodeLLaMa-13b, that need to be merged. The overall goal is to merge two models (one trained on Python and another trained...
Considering I have metered internet and not so great resources, I followed your guind and the notebook. I used this yaml: ``` slices: - sources: - model: mistralai/Mistral-7B-Instruct-v0.3 layer_range: [0,...
Hello, documentation seems to be a little sparse on this / feature doesn't exist yet: is there a way to pass lm-eval-harness arguments directly while running mergekit-evolve? for instance: the...
Hi authors, You may consider implementing this in the toolkit: https://arxiv.org/pdf/2406.07529v1 (code: https://github.com/luli-git/MAP). It's an advanced version of model merging based on task vectors. Thanks!
I want to share a new merging algorithm: Geometric Median, as simple as it is, somehow stackable, and currently **tested with unfiltered 116 SDXL models**. Sadly due to the historic...