mergekit
mergekit copied to clipboard
Tools for merging pretrained large language models.
Hi, Sorry about asking so many questions, but do you know if it's possible to "unmerge" a MoE model and extract each expert as a separate model? For example, could...
Is there any plans to add the PESC method described in this [paper](https://arxiv.org/abs/2401.02731) and gave birth to these models [Camelidae-8x7B](https://huggingface.co/hywu/Camelidae-8x7B) [Camelidae-8x13B](https://huggingface.co/hywu/Camelidae-8x13B) and [Camelidae-8x34B](https://huggingface.co/hywu/Camelidae-8x34B) Check their repo [here](https://github.com/wuhy68/Parameter-Efficient-MoE/tree/master)
whenever I make a mistral model using two llama2 13b models, I get the following error message: Traceback (most recent call last): File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code,...
Hi there, a questions about the process of merging different llms into moe. So, for mergekit-moe, if we use 'hidden' gate method, we have to provide at least one positive...
I know that phi has different naming for its paramater. But, the algorithm is still transformers which contains self-attention and MLP. If we rename the parameter following llama/mistral format, is...
Hi, I try to add Qwen-moe into mixtral_moe.py, and I have done some modifications. But now, I meet some problems in there.  I think it is wrong, because auto_map...
``` ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64... C:\Users\irene\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning:...
giving error: ``` ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. lida...
Hi @cg123, would it be feasible to merge models from different pre-trained backbone? For example, can we merge a model fine-tuned on Mistral-7b with a model fine-tuned on Llama-2-7b? Or...
this could be due to my own mistake but i cant seem for the life of me figure out why the merge works then the tokenizer merge fails. ``` Traceback...