mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Tools for merging pretrained large language models.

Results 231 mergekit issues
Sort by recently updated
recently updated
newest added

Hi, Sorry about asking so many questions, but do you know if it's possible to "unmerge" a MoE model and extract each expert as a separate model? For example, could...

Is there any plans to add the PESC method described in this [paper](https://arxiv.org/abs/2401.02731) and gave birth to these models [Camelidae-8x7B](https://huggingface.co/hywu/Camelidae-8x7B) [Camelidae-8x13B](https://huggingface.co/hywu/Camelidae-8x13B) and [Camelidae-8x34B](https://huggingface.co/hywu/Camelidae-8x34B) Check their repo [here](https://github.com/wuhy68/Parameter-Efficient-MoE/tree/master)

whenever I make a mistral model using two llama2 13b models, I get the following error message: Traceback (most recent call last): File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code,...

Hi there, a questions about the process of merging different llms into moe. So, for mergekit-moe, if we use 'hidden' gate method, we have to provide at least one positive...

I know that phi has different naming for its paramater. But, the algorithm is still transformers which contains self-attention and MLP. If we rename the parameter following llama/mistral format, is...

Hi, I try to add Qwen-moe into mixtral_moe.py, and I have done some modifications. But now, I meet some problems in there. ![1](https://github.com/cg123/mergekit/assets/53638291/000d5134-0fe0-4ba5-ad4c-745974c3dbee) I think it is wrong, because auto_map...

``` ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... C:\Users\irene\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning:...

giving error: ``` ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. lida...

Hi @cg123, would it be feasible to merge models from different pre-trained backbone? For example, can we merge a model fine-tuned on Mistral-7b with a model fine-tuned on Llama-2-7b? Or...

this could be due to my own mistake but i cant seem for the life of me figure out why the merge works then the tokenizer merge fails. ``` Traceback...