mergekit issues

The speed issue with the GTATask.

3

Thank you for open-sourcing such a great tool. When executing the GTATask, I use LoadTensor twice to load additional tensors, but this makes the execution very slow. How can I...

daidaiershidi

Error at MoE Qwen 1.5B

2

``` mergekit-moe config.yaml merge --copy-tokenizer --device cuda --low-cpu-memory --trust-remote-code ERROR:root:No output architecture found that is compatible with the given models. ERROR:root:All supported output architectures: ERROR:root: * Mixtral ERROR:root: * DeepSeek...

ehristoforu

Support for Phi-3-Small [Feature ?]

It seems that Phi-3-Small models do not use Phi3ForCausalLM but rather use Phi3SmallForCausalLM. I tried coding up a config for merging the blocks but it didn't work properly on my...

hammoudhasan

Null vocab_file Issue with mistral v03 based models when using union tokenizer source

2

### Environment Conda environment: python=3.10 mergekit commit f086664c983ad8b5f126d40ce2e4385f9e65f32c (latest as of yesterday) transformers from git @ git+https://github.com/huggingface/transformers [85817d98fb60977c97e3014196a462b732d2ed1a](https://github.com/huggingface/transformers/tree/85817d98fb60977c97e3014196a462b732d2ed1a) (latest as of yesterday) Same issue with the transformers version installed by...

guillermo-gabrielli-fer

Is there a way to run LORA extraction using multi GPU? 70B LORA extraction OOM on 24GB 3090Ti

1

Just as the title say, I am trying to extract LORA of a Llama 3.1 70B model and it OOM on a single 24GB GPU. Is there a way to...

Nero10578

Result of merging 2 Gemma2 9B models gains 1B parameters somehow

5

Resulting model weights and SLERP merge formula here: https://huggingface.co/grimjim/Gemma2-Nephilim-v3-9B An exl2 quant of the above works, but where did the extra 1B parameters come from?

jim-plus

Example case of task_arithmetic needed

1

Parameter setting in examples is too simple. It's really hard to follow how to set parameters for different methods. For example, `task_arthmetic` is missing. How to merge different layers with...

Opdoop

ABM corrections

## Fixes 1. uncaught dataset loader bug that pops up because padding wasn't set correctly ## Improves 1. Use of transpose law to simplify expression 2. Removes unnecessary complexity and...

metric-space

MoE exits itself after expert prompts 100% 2/2

``` import yaml yaml_config = """ base_model: SameedHussain/phi-3.8-flight dtype: float16 gate_mode: hidden experts: - source_model: SameedHussain/phi-3.8-flight positive_prompts: - "flight" - source_model: SameedHussain/phi-3.8-hotel positive_prompts: - "hotel" """ # Save config as...

SameedHusayn

Weights Metrics

Implemented: - Framework to compute metrics based on layer weights using existing mergekit infrastructure (run_measure is based on run_merge, metric_methods based on merge_methods etc). - plot_tools.MetricsHandler to load metrics output,...

ElliotStein

mergekit
mergekit copied to clipboard

Metadata

The speed issue with the GTATask.

Error at MoE Qwen 1.5B

Support for Phi-3-Small [Feature ?]

Null vocab_file Issue with mistral v03 based models when using union tokenizer source

Is there a way to run LORA extraction using multi GPU? 70B LORA extraction OOM on 24GB 3090Ti

Result of merging 2 Gemma2 9B models gains 1B parameters somehow

Example case of task_arithmetic needed

ABM corrections

MoE exits itself after expert prompts 100% 2/2

Weights Metrics

← Metadata

Owner

Metadata

mergekit mergekit copied to clipboard

Metadata

← Metadata

Owner

Metadata

mergekit
mergekit copied to clipboard