mergekit
mergekit copied to clipboard
Tools for merging pretrained large language models.
When attempting to merge three 14B LLMs on a custom task using the mergekit-evolve method, I ran into memory overflow issues on 8 A100 GPUs, each with 80G of memory....
Config file to reproduce: ``` base_model: meta-llama/Meta-Llama-3-70B-Instruct dtype: bfloat16 merge_method: slerp parameters: t: - filter: self_attn value: - 0 - 0.5 - 0.3 - 0.7 - 1 - filter: mlp...
Hello, Thank you for your great work! I am new to model merging and mergekit and I am trying to merge part of parameters in each layer (like FFN in...
I see this in a number of merges, but can not get a clear read on it's impact: parameters: int8_mask: true Please advise; thanks D
Averaging a norm weight of 0.5 and a norm weight of 2.0 should give 1.0 rather than 1.25, and linear interpolation will similarly favour values > 1 compared to <...
I see that Phi-3 support is added. Could we add support for microsoft/Phi-3-vision-128k-instruct. Thanks.
Hi Teams, Thanks for the contribution! I am new to the model merging and have some questions regarding to the density gradient and weight gradient in Ties example (https://github.com/arcee-ai/mergekit/blob/main/examples/ties.yml). 1....
I tried to extract a LoRA from `Xwin-LM/Xwin-Math-70B-V1.1` and got this: ``` delta_weight = new_weight - base_weight ~~~~~~~~~~~^~~~~~~~~~~~~ RuntimeError: The size of tensor a (32002) must match the size of...
So, for some reason its neither fully allocating my cpu, and if i run it using cuda it takes 8 hrs instead of 30min. is this tool only capable of...
Allows using bitsandbytes quantization in `mergekit-evolve` when a) not using vLLM and b) not using in-memory mode.