mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Tools for merging pretrained large language models.

Results 231 mergekit issues
Sort by recently updated
recently updated
newest added

Hi, For the Mixtral `positive_prompts`, what is the optimal prompt? (ie should it be general keywords, or a description?) For example, should it be: ```yaml - "tech" - "coding" -...

WIN10 RTX4060 hey... so first can i do all offline and is it only for GPTQ, becasue GGUF has no specific config file ?!? and how exactly must be the...

Thank you very much for this open-source project, it's an excellent tool for merging large language models. I have a few questions: 1. Could you provide a simple overview of...

I just freshly cloned mergekit. With the following config: ``` models: - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama # No parameters necessary for base model - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B parameters: weight: 0.5 density: 0.68 -...

Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?

Hi @cg123, when using `slerp`, I was confused about the parameter `t` in `yaml` file in [slerp.py](https://github.com/cg123/mergekit/blob/main/mergekit/merge_methods/slerp.py#L64), the parameter `t` is a `float` or `np.ndarray`, but I noticed in this...

Adding the "--lazy_unpickle" flag when performing a merge consumes less memory. What is the principle behind this? Also, does it reduce the accuracy?

As far as I understand in this [line](https://github.com/cg123/mergekit/blob/mixtral/mergekit/scripts/mixtral_moe.py#L137), it seems like we subtract negative prompt embeddings from positive prompt embeddings. What is the reason for this? @cg123 @DocShotgun @q5sys

forgive me of my ignorance but before mergekit, I only knew of merging qlora weights and came across this https://github.com/Gryphe/BlockMerge_Gradient I've been tracking a lot of papers on how to...