mergekit issues

[Mixtral] Positive Prompt Format

Hi, For the Mixtral `positive_prompts`, what is the optimal prompt? (ie should it be general keywords, or a description?) For example, should it be: ```yaml - "tech" - "coding" -...

fakerybakery

no error simple question

3

WIN10 RTX4060 hey... so first can i do all offline and is it only for GPTQ, becasue GGUF has no specific config file ?!? and how exactly must be the...

kalle07

The differences in principles and effects between the various merging methods

1

Thank you very much for this open-source project, it's an excellent tool for merging large language models. I have a few questions: 1. Could you provide a simple overview of...

hywchina

Can you conduct TIES merging only on the embedding weights of two models?

1

shamanez

Union tokenizer merging seems to break lazy tensor loading

I just freshly cloned mergekit. With the following config: ``` models: - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama # No parameters necessary for base model - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B parameters: weight: 0.5 density: 0.68 -...

brucethemoose

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

2

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

win10ogod

confuse about parameter t in slerp

3

Hi @cg123, when using `slerp`, I was confused about the parameter `t` in `yaml` file in [slerp.py](https://github.com/cg123/mergekit/blob/main/mergekit/merge_methods/slerp.py#L64), the parameter `t` is a `float` or `np.ndarray`, but I noticed in this...

zyh3826

Lazy tensor loader

3

Adding the "--lazy_unpickle" flag when performing a merge consumes less memory. What is the principle behind this? Also, does it reduce the accuracy?

sudy-super

Mixtral branch : What happens when we give both positive and negative prompts per an expert ?

4

As far as I understand in this [line](https://github.com/cg123/mergekit/blob/mixtral/mergekit/scripts/mixtral_moe.py#L137), it seems like we subtract negative prompt embeddings from positive prompt embeddings. What is the reason for this? @cg123 @DocShotgun @q5sys

shamanez

gradient merge

1

forgive me of my ignorance but before mergekit, I only knew of merging qlora weights and came across this https://github.com/Gryphe/BlockMerge_Gradient I've been tracking a lot of papers on how to...

thistleknot

mergekit
mergekit copied to clipboard

Metadata

[Mixtral] Positive Prompt Format

no error simple question

The differences in principles and effects between the various merging methods

Can you conduct TIES merging only on the embedding weights of two models?

Union tokenizer merging seems to break lazy tensor loading

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

confuse about parameter t in slerp

Lazy tensor loader

Mixtral branch : What happens when we give both positive and negative prompts per an expert ?

gradient merge

← Metadata

Owner

Metadata

mergekit mergekit copied to clipboard

Metadata

← Metadata

Owner

Metadata

mergekit
mergekit copied to clipboard