mergekit issues

MOE models that based on Gemma cannot work

3

I use mergekit-moe to generate MOE model with several same Gemma models(mode hidden), but the result model output meaningless results like in https://github.com/arcee-ai/mergekit/issues/218#issuecomment-2027402773. It didn't happened when I merged models...

pharaohcaptain

Support for fine-grained experts in MoE models

Are there any plans to support fine-grained experts in the future? Fine-grained experts is a technique adopted in projects like Qwen MoE and DeepSeek MoE, and has shown promising results....

misdelivery

Condense a models layers.

I am trying to condense a model by 1/4. I want to merge the 4 layer over the previous 3 layers, When i try this i get 0 layers on...

DewEfresh

Support of BitNet

BitNet is now supported in many structures. A good start would be to add its support to mergekit.

Ttimofeyka

About the merging method used for Arcee-Spark

Hi team, could you please tell me which merging method you used for Arcee-Spark? Thanks.

daiquocnguyen

Need some help in merging same architectures, but with different tokens in their tokenizers

4

Hello! I actually have two models - CodeLLaMa-13b-Python and CodeLLaMa-13b, that need to be merged. The overall goal is to merge two models (one trained on Python and another trained...

choprahetarth

Sincerely.. I have no words...without insulting someone.

2

Considering I have metered internet and not so great resources, I followed your guind and the notebook. I used this yaml: ``` slices: - sources: - model: mistralai/Mistral-7B-Instruct-v0.3 layer_range: [0,...

0wwafa

mergekit
mergekit copied to clipboard

Metadata

MOE models that based on Gemma cannot work

Support for fine-grained experts in MoE models

Condense a models layers.

Support of BitNet

About the merging method used for Arcee-Spark

Need some help in merging same architectures, but with different tokens in their tokenizers

Sincerely.. I have no words...without insulting someone.

pass eval arguments directly in mergekit-evo

New method: MAP:Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

New merge method / algorithm proposal: Geometric Median and TGMD merge

← Metadata

Owner

Metadata

mergekit mergekit copied to clipboard

Metadata

← Metadata

Owner

Metadata

mergekit
mergekit copied to clipboard