mergekit
mergekit copied to clipboard
Tools for merging pretrained large language models.
Hi, For the Mixtral `positive_prompts`, what is the optimal prompt? (ie should it be general keywords, or a description?) For example, should it be: ```yaml - "tech" - "coding" -...
WIN10 RTX4060 hey... so first can i do all offline and is it only for GPTQ, becasue GGUF has no specific config file ?!? and how exactly must be the...
Thank you very much for this open-source project, it's an excellent tool for merging large language models. I have a few questions: 1. Could you provide a simple overview of...
I just freshly cloned mergekit. With the following config: ``` models: - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama # No parameters necessary for base model - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B parameters: weight: 0.5 density: 0.68 -...
Can we use algorithms to automatically optimize the merging of the weights and layers of the model along the most efficient path?
Hi @cg123, when using `slerp`, I was confused about the parameter `t` in `yaml` file in [slerp.py](https://github.com/cg123/mergekit/blob/main/mergekit/merge_methods/slerp.py#L64), the parameter `t` is a `float` or `np.ndarray`, but I noticed in this...
Adding the "--lazy_unpickle" flag when performing a merge consumes less memory. What is the principle behind this? Also, does it reduce the accuracy?
As far as I understand in this [line](https://github.com/cg123/mergekit/blob/mixtral/mergekit/scripts/mixtral_moe.py#L137), it seems like we subtract negative prompt embeddings from positive prompt embeddings. What is the reason for this? @cg123 @DocShotgun @q5sys
forgive me of my ignorance but before mergekit, I only knew of merging qlora weights and came across this https://github.com/Gryphe/BlockMerge_Gradient I've been tracking a lot of papers on how to...