mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Tools for merging pretrained large language models.

Results 231 mergekit issues
Sort by recently updated
recently updated
newest added

When attempting to merge three 14B LLMs on a custom task using the mergekit-evolve method, I ran into memory overflow issues on 8 A100 GPUs, each with 80G of memory....

Config file to reproduce: ``` base_model: meta-llama/Meta-Llama-3-70B-Instruct dtype: bfloat16 merge_method: slerp parameters: t: - filter: self_attn value: - 0 - 0.5 - 0.3 - 0.7 - 1 - filter: mlp...

Hello, Thank you for your great work! I am new to model merging and mergekit and I am trying to merge part of parameters in each layer (like FFN in...

I see this in a number of merges, but can not get a clear read on it's impact: parameters: int8_mask: true Please advise; thanks D

Averaging a norm weight of 0.5 and a norm weight of 2.0 should give 1.0 rather than 1.25, and linear interpolation will similarly favour values > 1 compared to <...

I see that Phi-3 support is added. Could we add support for microsoft/Phi-3-vision-128k-instruct. Thanks.

Hi Teams, Thanks for the contribution! I am new to the model merging and have some questions regarding to the density gradient and weight gradient in Ties example (https://github.com/arcee-ai/mergekit/blob/main/examples/ties.yml). 1....

I tried to extract a LoRA from `Xwin-LM/Xwin-Math-70B-V1.1` and got this: ``` delta_weight = new_weight - base_weight ~~~~~~~~~~~^~~~~~~~~~~~~ RuntimeError: The size of tensor a (32002) must match the size of...

So, for some reason its neither fully allocating my cpu, and if i run it using cuda it takes 8 hrs instead of 30min. is this tool only capable of...

Allows using bitsandbytes quantization in `mergekit-evolve` when a) not using vLLM and b) not using in-memory mode.