Awesome-Model-Merging-Methods-Theories-Applications
Awesome-Model-Merging-Methods-Theories-Applications copied to clipboard
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
Awesome-Model-Merging-Methods-Theories-Applications
[!TIP] If you have a relevant paper not included in the library, or have any clarification about the content of the paper, please contact us!
A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities'.
Abstract
Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.

Citation
If you find our paper or this resource helpful, please consider cite:
@article{Survery_ModelMerging_2024,
title={Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities},
author={Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
journal={arXiv preprint arXiv:2408.07666},
year={2024}
}
Thanks!
Framework
-
Awesome-Model-Merging-Methods-Theories-Applications
-
Advanced Methods
-
Pre-Merging Methods
- Linearization Fine-tuning
- Architecture Transformation
- Weight Alignment
- Others
-
During Merging Methods
- Basic Merging Methods
- Weighted-based Merging Methods
- Subspace-based Merging Methods
- Routing-based Merging Methods
- Post-calibration based Methods
- Theories and Analysis of Model Merging
-
Pre-Merging Methods
-
Application of Model Merging in Foundation Models
-
Model Merging in Large Language Model
- Human Preference Alignment for LLMs
- Detoxifcation of LLMs
- Knowledge Unlearning of LLMs
- Faster Training of LLMs
- Combine the Capabilities of Expert LLMs
-
Model Merging in Multimodal Large Language Models
- Model Merging for Multimodal Fusion
- Model Merging for Cross-Modal Knowledge Transfer
-
Model Merging in Image Generative Models
- Style Mixing in Generative Models
- Reducing Training Cost of Generative Models
- Enhancing the Faithfulness of Diffusion Models
-
Model Merging in Large Language Model
-
Application of Model Merging in Different Machine Learning Subfields
-
Model Merging in Continual Learning
- Model Merging to Mitigate Catastrophic Forgetting
-
Model Merging in Multi-Task/Multi-Objective/Multi-Domain/Auxiliary Learning
- Model Merging for Knowledge Transfer in Multi-Task Learning
- Model Merging for Knowledge Transfer in Multi-Objective Optimization
- Model Merging for Knowledge Transfer in Multi-Domain Learning
- Model Merging for Knowledge Transfer in Auxiliary Learning
-
Model Merging in Out-of-Distribution/Domain Generalization
- Model Merging for Better Out-of-Distribution Generalization
- Model Merging for Better Domain Generalization
-
Model Merging in Federated Learning
- Model Merging for Local Knowledge Aggregation
-
Model Merging in Zero-shot/Few-shot Learning
- Model Merging for Cross-task Generalization in Zero-shot Learning
- Model Merging for Cross-task Generalization in Few-shot Learning
-
Model Merging in Adversarial Learning
- Model Merging as an Attack
- Model Merging as a Defense
-
Model Merging in Continual Learning
- Other Applications
-
Advanced Methods
Advanced Methods

Pre-Merging Methods
Linearization Fine-tuning
Paper Title | Year | Conference/Journal |
---|---|---|
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic | 2024 | Arxiv |
Tangent Transformers for Composition,Privacy and Removal | 2024 | ICLR |
Parameter Efficient Multi-task Model Fusion with Partial Linearization | 2024 | ICLR |
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models | 2023 | NeurIPS |
Architecture Transformation
Paper Title | Year | Conference/Journal |
---|---|---|
Knowledge fusion of large language models | 2024 | ICLR |
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report | 2024 | Arxiv |
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks | 2023 | ICASSP |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Weight Alignment
Others
Paper Title | Year | Conference/Journal |
---|---|---|
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging | 2024 | Arxiv |
During Merging Methods
Basic Merging Methods
Paper Title | Year | Conference/Journal |
---|---|---|
Composing parameter-efficient modules with arithmetic operation | 2023 | NeurIPS |
Editing models with task arithmetic | 2023 | ICLR |
Model fusion via optimal transport | 2020 | NeurIPS |
Weight averaging for neural networks and local resampling schemes | 1996 | AAAI Workshop |
Weighted-based Merging Methods
Paper Title | Year | Conference/Journal |
---|---|---|
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling | 2024 | Arxiv |
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic | 2024 | Arxiv |
Checkpoint Merging via Bayesian Optimization in LLM Pretraining | 2024 | Arxiv |
Arcee’s MergeKit: A Toolkit for Merging Large Language Models | 2024 | Arxiv |
Evolutionary optimization of model merging recipes | 2024 | Arxiv |
AdaMerging: Adaptive Model Merging for Multi-Task Learning | 2024 | ICLR |
Model Merging by Uncertainty-Based Gradient Matching | 2024 | ICLR |
Merging by Matching Models in Task Subspaces | 2024 | TMLR |
Fisher Mask Nodes for Language Model Merging | 2024 | LREC-COLING |
Erasure Coded Neural Network Inference via Fisher Averaging | 2024 | ISIT |
Dataless Knowledge Fusion by Merging Weights of Language Models | 2023 | ICLR |
Merging models with fisher-weighted averaging | 2022 | NeurIPS |
Subspace-based Merging Method
Routing-based Merging Methods
Post-calibration based Methods
Paper Title | Year | Conference/Journal |
---|---|---|
Representation Surgery for Multi-Task Model Merging | 2024 | ICML |
Theories and Analysis of Model Merging
Application of Model Merging in Foundation Models

Model Merging in Large Language Model
Human Preference Alignment for LLMs
Detoxifcation of LLMs
Paper Title | Year | Conference/Journal |
---|---|---|
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation | 2024 | AAAI |
Mitigating Social Biases in Language Models through Unlearning | 2024 | Arxiv |
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models | 2024 | Arxiv |
Composing Parameter-Efficient Modules with Arithmetic Operation | 2023 | NeurIPS |
Editing models with task arithmetic | 2023 | ICLR |
Knowledge Unlearning of LLMs
Paper Title | Year | Conference/Journal |
---|---|---|
Strong Copyright Protection for Language Models via Adaptive Model Fusion | 2024 | ICML |
Avoiding Copyright Infringement via Machine Unlearning | 2024 | Arxiv |
Towards Safer Large Language Models through Machine Unlearning | 2024 | ACL |
Editing models with task arithmetic | 2023 | ICLR |
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model | 2023 | Arxiv |
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion | 2023 | Arxiv |
Faster Training of LLMs
Paper Title | Year | Conference/Journal |
---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv |
Checkpoint Merging via Bayesian Optimization in LLM Pretraining | 2024 | Arxiv |
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning | 2023 | ACL |
Early Weight Averaging meets High Learning Rates for LLM Pre-training | 2023 | NeurIPS Workshop |
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging | 2022 | NeurIPS Workshop |
Fusing finetuned models for better pretraining | 2022 | Arxiv |
Combine the Capabilities of Expert LLMs
Paper Title | Year | Conference/Journal |
---|---|---|
LLM Merging: Building LLMs Efficiently through Merging | 2024 | NeurIPS 2024 Competition Track |
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement | 2024 | Arxiv |
It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization | 2024 | Arxiv |
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic | 2024 | Arxiv |
Knowledge fusion of large language models | 2024 | ICLR |
Language models are super mario: Absorbing abilities from homologous models as a free lunch | 2024 | ICML |
Controlled Text Generation via Language Model Arithmetic | 2024 | ICML |
Evolutionary optimization of model merging recipes | 2024 | Arxiv |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | 2024 | Arxiv |
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report | 2024 | Arxiv |
Model Merging in Multimodal Large Language Models
Model Merging for Multimodal Fusion
Paper Title | Year | Conference/Journal |
---|---|---|
Jointly training large autoregressive multimodal models | 2024 | ICLR |
Model Composition for Multimodal Large Language Models | 2024 | ACL |
π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation | 2023 | ICML |
An Empirical Study of Multimodal Model Merging | 2023 | EMNLP |
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks | 2023 | TMLR |
Model Merging for Cross-Modal Knowledge Transfer
Paper Title | Year | Conference/Journal |
---|---|---|
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification | 2024 | ICASSP Workshop |
Model Merging in Image Generative Models
Style Mixing in Generative Models
Paper Title | Year | Conference/Journal |
---|---|---|
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models | 2024 | Arxiv |
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models | 2024 | Arxiv |
MoLE: Mixture of LoRA Experts | 2024 | ICLR |
Merging loras | 2023 | (github) |
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | 2023 | Arxiv |
GAN Cocktail: mixing GANs without dataset access | 2022 | ECCV |
Reducing Training Cost of Generative Models
Paper Title | Year | Conference/Journal |
---|---|---|
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better | 2024 | Arxiv |
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA | 2024 | Arxiv |
Enhancing the Faithfulness of Diffusion Models
Paper Title | Year | Conference/Journal |
---|---|---|
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data | 2024 | Arxiv |
Application of Model Merging in Different Machine Learning Subfields

Model Merging in Continual Learning
Model Merging to Mitigate Catastrophic Forgetting
Model Merging in Multi-Task/Multi-Objective/Multi-Domain/Auxiliary Learning
Model Merging for Knowledge Transfer in Multi-Task Learning
Paper Title | Year | Conference/Journal |
---|---|---|
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer | 2024 | Arxiv |
Evolutionary optimization of model merging recipes | 2024 | Arxiv |
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch | 2024 | ICML |
Representation Surgery for Multi-Task Model Merging | 2024 | ICML |
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts | 2024 | ICML |
ZipIt! Merging Models from Different Tasks without Training | 2024 | ICLR |
AdaMerging: Adaptive Model Merging for Multi-Task Learning | 2024 | ICLR |
Resolving Interference When Merging Models | 2023 | NeurIPS |
Editing models with task arithmetic | 2023 | ICLR |
Model Merging for Knowledge Transfer in Multi-Objective Optimization
Paper Title | Year | Conference/Journal |
---|---|---|
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging | 2024 | Arxiv |
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion | 2024 | Arxiv |
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation | 2024 | Arxiv |
Model Merging for Knowledge Transfer in Multi-Domain Learning
Paper Title | Year | Conference/Journal |
---|---|---|
DEM: Distribution Edited Model for Training with Mixed Data Distributions | 2024 | Arxiv |
Merging Vision Transformers from Different Tasks and Domains | 2023 | Arxiv |
Model Merging for Knowledge Transfer in Auxiliary Learning
Paper Title | Year | Conference/Journal |
---|---|---|
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning | 2023 | NeurIPS |
Model Merging in Out-of-Distribution/Domain Generalization
Model Merging for Better Out-of-Distribution Generalization
Model Merging for Better Domain Generalization
Paper Title | Year | Conference/Journal |
---|---|---|
Training-Free Model Merging for Multi-target Domain Adaptation | 2024 | Arxiv |
Ensemble of averages: Improving model selection and boosting performance in domain generalization | 2022 | NeurIPS |
Swad: Domain generalization by seeking flat minima | 2021 | NeurIPS |
Model Merging in Federated Learning
Model Merging for Local Knowledge Aggregation
Model Merging in Zero-shot/Few-shot Learning
Model Merging for Cross-task Generalization in Zero-shot Learning
Model Merging for Cross-task Generalization in Few-shot Learning
Model Merging in Adversarial Learning
Model Merging as an Attack
Paper Title | Year | Conference/Journal |
---|---|---|
BadMerging: Backdoor Attacks Against Model Merging | 2024 | CCS |
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario | 2024 | ACL |
Model Merging as a Defense
Paper Title | Year | Conference/Journal |
---|---|---|
Here’s a Free Lunch: Sanitizing Backdoored Models with Model Merge | 2024 | ACL |
Merging Improves Self-Critique Against Jailbreak Attacks | 2024 | Arxiv |
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging | 2024 | Arxiv |
Revisiting adapters with adversarial training | 2023 | ICLR |
Seasoning model soups for robustness to adversarial and natural distribution shifts | 2023 | CVPR |
Other Applications
Paper Title | Year | Conference/Journal |
---|---|---|
Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation | 2024 | Interspeech |
Erasure Coded Neural Network Inference via Fisher Averaging | 2024 | Arxiv |
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging | 2024 | Arxiv |
Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization | 2024 | Arxiv |
Star History
Contact
We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.
If you have a related paper that was not added to the library, please contact us.
Email: [email protected] / [email protected]