mergekit issues

Evolutionary Merging out of memory

4

When attempting to merge three 14B LLMs on a custom task using the mergekit-evolve method, I ran into memory overflow issues on 8 A100 GPUs, each with 80G of memory....

ArcherShirou

ValueError: operands could not be broadcast together with shapes (8192,28672) (8192,24576)

1

Config file to reproduce: ``` base_model: meta-llama/Meta-Llama-3-70B-Instruct dtype: bfloat16 merge_method: slerp parameters: t: - filter: self_attn value: - 0 - 0.5 - 0.3 - 0.7 - 1 - filter: mlp...

ashmalvayani

How to merge specific parameters

Hello, Thank you for your great work! I am new to model merging and mergekit and I am trying to merge part of parameters in each layer (like FFN in...

qhz991029

parameters: int8_mask: true ??

1

I see this in a number of merges, but can not get a clear read on it's impact: parameters: int8_mask: true Please advise; thanks D

David-AU-github

Use logscale for operations dealing with norm layers?

Averaging a norm weight of 0.5 and a norm weight of 2.0 should give 1.0 rather than 1.25, and linear interpolation will similarly favour values > 1 compared to <...

jukofyork

Support for microsoft/Phi-3-vision-128k-instruct

I see that Phi-3 support is added. Could we add support for microsoft/Phi-3-vision-128k-instruct. Thanks.

AshD

Questions about density gradient and weight gradient in Ties example

Hi Teams, Thanks for the contribution! I am new to the model merging and have some questions regarding to the density gradient and weight gradient in Ties example (https://github.com/arcee-ai/mergekit/blob/main/examples/ties.yml). 1....

zhang-tuo-pdf

`extract_lora.py` can't handle mismatched `lm_head` tensor due to added tokens

10

I tried to extract a LoRA from `Xwin-LM/Xwin-Math-70B-V1.1` and got this: ``` delta_weight = new_weight - base_weight ~~~~~~~~~~~^~~~~~~~~~~~~ RuntimeError: The size of tensor a (32002) must match the size of...

jukofyork