mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Tools for merging pretrained large language models.

Results 231 mergekit issues
Sort by recently updated
recently updated
newest added

Thank you for open-sourcing such a great tool. When executing the GTATask, I use LoadTensor twice to load additional tensors, but this makes the execution very slow. How can I...

``` mergekit-moe config.yaml merge --copy-tokenizer --device cuda --low-cpu-memory --trust-remote-code ERROR:root:No output architecture found that is compatible with the given models. ERROR:root:All supported output architectures: ERROR:root: * Mixtral ERROR:root: * DeepSeek...

It seems that Phi-3-Small models do not use Phi3ForCausalLM but rather use Phi3SmallForCausalLM. I tried coding up a config for merging the blocks but it didn't work properly on my...

### Environment Conda environment: python=3.10 mergekit commit f086664c983ad8b5f126d40ce2e4385f9e65f32c (latest as of yesterday) transformers from git @ git+https://github.com/huggingface/transformers [85817d98fb60977c97e3014196a462b732d2ed1a](https://github.com/huggingface/transformers/tree/85817d98fb60977c97e3014196a462b732d2ed1a) (latest as of yesterday) Same issue with the transformers version installed by...

Just as the title say, I am trying to extract LORA of a Llama 3.1 70B model and it OOM on a single 24GB GPU. Is there a way to...

Resulting model weights and SLERP merge formula here: https://huggingface.co/grimjim/Gemma2-Nephilim-v3-9B An exl2 quant of the above works, but where did the extra 1B parameters come from?

Parameter setting in examples is too simple. It's really hard to follow how to set parameters for different methods. For example, `task_arthmetic` is missing. How to merge different layers with...

## Fixes 1. uncaught dataset loader bug that pops up because padding wasn't set correctly ## Improves 1. Use of transpose law to simplify expression 2. Removes unnecessary complexity and...

``` import yaml yaml_config = """ base_model: SameedHussain/phi-3.8-flight dtype: float16 gate_mode: hidden experts: - source_model: SameedHussain/phi-3.8-flight positive_prompts: - "flight" - source_model: SameedHussain/phi-3.8-hotel positive_prompts: - "hotel" """ # Save config as...

Implemented: - Framework to compute metrics based on layer weights using existing mergekit infrastructure (run_measure is based on run_merge, metric_methods based on merge_methods etc). - plot_tools.MetricsHandler to load metrics output,...