Knowledge-Distillation-Paper icon indicating copy to clipboard operation
Knowledge-Distillation-Paper copied to clipboard

This resposity maintains a collection of important papers on knowledge distillation (awesome-knowledge-distillation)).

Awesome License: MIT

Knowledge-Distillation-Paper

This resposity maintains a collection of important papers on knowledge distillation.

  • Knowledge-Distillation-Paper
    • Pioneering Papers
    • Survey Papers
    • Distillation Accelerates Diffusion Models
    • Feature Distillation
    • Online Knowledge Distillation
    • Multi-Teacher Knowledge Distillation
    • Data-Free Knowledge Distillation
    • Distillation for Segmentation
    • Useful Resources

Pioneering Papers

  • Model Compression, KDD 2006

    • https://dl.acm.org/doi/abs/10.1145/1150402.1150464
    • Cristian Buciluǎ, Rich Caruana, Alexandru Niculescu-Mizil.
  • Do Deep Nets Really Need to be Deep?, NeurIPS 2014

    • https://arxiv.org/abs/1312.6184
    • Lei Jimmy Ba, Rich Caruana.
  • Distilling the Knowledge in a Neural Network, NeurIPS-workshop 2014

    • https://arxiv.org/abs/1503.02531
    • https://neurips.cc/virtual/2014/workshop/4294
    • Geoffrey Hinton, Oriol Vinyals, Jeff Dean.

Survey Papers

  • Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks, TPAMI 2022

    • https://arxiv.org/abs/2004.05937
    • Lin Wang, Kuk-Jin Yoon.
  • Knowledge Distillation: A Survey, IJCV 2021

    • https://arxiv.org/abs/2006.05525
    • Jianping Gou, Baosheng Yu, Stephen John Maybank, Dacheng Tao.

Distillation Accelerates Diffusion Models

Extremely Promising !!!!!

  • Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed [Tensorflow]

    • https://arxiv.org/abs/2101.02388
    • Eric Luhman, Troy Luhman
  • Progressive Distillation for Fast Sampling of Diffusion Models, ICLR 2022 [Tensorflow]

    • https://arxiv.org/abs/2202.00512
    • Tim Salimans, Jonathan Ho
  • Accelerating Diffusion Sampling with Classifier-based Feature Distillation, ICME 2023 [PyTorch]

    • https://arxiv.org/abs/2211.12039
    • Wujie Sun, Defang Chen, Can Wang, Deshi Ye, Yan Feng, Chun Chen
  • Fast Sampling of Diffusion Models via Operator Learning, ICML 2023 [PyTorch]

    • https://arxiv.org/abs/2211.13449
    • Hongkai Zheng, Weili Nie, Arash Vahdat, Kamyar Azizzadenesheli, Anima Anandkumar
  • Consistency Models, ICML 2023 [PyTorch]

    • https://arxiv.org/abs/2303.01469
    • Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever
  • TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation [PyTorch]

    • https://arxiv.org/abs/2303.04248
    • David Berthelot, Arnaud Autef, Jierui Lin, Dian Ang Yap, Shuangfei Zhai, Siyuan Hu, Daniel Zheng, Walter Talbott, Eric Gu
  • A Geometric Perspective on Diffusion Models

    • https://arxiv.org/abs/2305.19947
    • Defang Chen, Zhenyu Zhou, Jian-Ping Mei, Chunhua Shen, Chun Chen, Can Wang
  • BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [PyTorch]

    • https://arxiv.org/abs/2306.05544
    • Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind
  • Fast ODE-based Sampling for Diffusion Models in Around 5 Steps, CVPR 2024 [PyTorch]

    • https://arxiv.org/abs/2312.00094
    • Zhenyu Zhou, Defang Chen, Can Wang, Chun Chen

Feature Distillation

  • FitNets: Hints for Thin Deep Nets, ICLR 2015 [Theano]

    • https://arxiv.org/abs/1412.6550
    • Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, Yoshua Bengio.
  • Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer, ICLR 2017 [PyTorch]

    • https://arxiv.org/abs/1612.03928
    • Sergey Zagoruyko, Nikos Komodakis.
  • Learning Deep Representations with Probabilistic Knowledge Transfer, ECCV 2018 [Pytorch]

    • https://arxiv.org/abs/1803.10837
    • Nikolaos Passalis, Anastasios Tefas.
  • Relational Knowledge Distillation, CVPR 2019 [Pytorch]

    • https://arxiv.org/abs/1904.05068
    • Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho.
  • Variational Information Distillation for Knowledge Transfer, CVPR 2019

    • https://arxiv.org/abs/1904.05835
    • Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai.
  • Similarity-Preserving Knowledge Distillation, CVPR 2019

    • https://arxiv.org/abs/1907.09682
    • Frederick Tung, Greg Mori.
  • Contrastive Representation Distillation, ICLR 2020 [Pytorch]

    • https://arxiv.org/abs/1910.10699
    • Yonglong Tian, Dilip Krishnan, Phillip Isola.
  • Heterogeneous Knowledge Distillation using Information Flow Modeling, CVPR 2020 [Pytorch]

    • https://arxiv.org/abs/2005.00727
    • Nikolaos Passalis, Maria Tzelepi, Anastasios Tefas.
  • Cross-Layer Distillation with Semantic Calibration, AAAI 2021 [Pytorch][TKDE]

    • https://arxiv.org/abs/2012.03236
    • Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, Chun Chen.
  • Distilling Knowledge via Knowledge Review, CVPR 2021 [Pytorch]

    • https://arxiv.org/abs/2104.09044
    • Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia.
  • Distilling Holistic Knowledge with Graph Neural Networks, ICCV 2021 [Pytorch]

    • https://arxiv.org/abs/2108.05507
    • Sheng Zhou, Yucheng Wang, Defang Chen, Jiawei Chen, Xin Wang, Can Wang, Jiajun Bu.
  • Decoupled Knowledge Distillation, CVPR 2022 [Pytorch]

    • https://arxiv.org/abs/2203.08679
    • Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, Jiajun Liang
  • Knowledge Distillation with the Reused Teacher Classifier, CVPR 2022 [Pytorch]

    • https://arxiv.org/abs/2203.14001
    • Defang Chen, Jian-Ping Mei, Hailin Zhang, Can Wang, Yan Feng, Chun Chen.

Online Knowledge Distillation

  • Deep Mutual Learning, CVPR 2018 [TensorFlow]

    • https://arxiv.org/abs/1706.00384
    • Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu.
  • Large scale distributed neural network training through online distillation, ICLR 2018

    • https://arxiv.org/abs/1804.03235
    • Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl and Geoffrey E. Hinton.
  • Knowledge Distillation by On-the-Fly Native Ensemble, NeurIPS 2018 [PyTorch]

    • https://arxiv.org/abs/1806.04606
    • Xu Lan, Xiatian Zhu, Shaogang Gong.
  • Online Knowledge Distillation with Diverse Peers, AAAI 2020 [Pytorch]

    • https://arxiv.org/abs/1912.00350
    • Defang Chen, Jian-Ping Mei, Can Wang, Yan Feng and Chun Chen.
  • Feature-map-level Online Adversarial Knowledge Distillation, ICML 2020

    • https://arxiv.org/abs/2002.01775
    • Inseop Chung, SeongUk Park, Jangho Kim, Nojun Kwak.
  • Peer collaborative learning for online knowledge distillation, AAAI 2021

    • https://arxiv.org/abs/2006.04147
    • Guile Wu, Shaogang Gong.

Multi-Teacher Knowledge Distillation

  • Distilling knowledge from ensembles of neural networks for speech recognition, INTERSPEECH 2016

    • https://www.isca-speech.org/archive_v0/Interspeech_2016/pdfs/1190.PDF
    • Austin Waters, Yevgen Chebotar.
  • Efficient Knowledge Distillation from an Ensemble of Teachers, INTERSPEECH 2017

    • https://isca-speech.org/archive_v0/Interspeech_2017/pdfs/0614.PDF
    • Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran.
  • Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space, NeurIPS 2020 [Pytorch]

    • https://proceedings.neurips.cc/paper/2020/hash/91c77393975889bd08f301c9e13a44b7-Abstract.html
    • Shangchen Du, Shan You, Xiaojie Li, Jianlong Wu, Fei Wang, Chen Qian, Changshui Zhang.
  • Reinforced Multi-Teacher Selection for Knowledge Distillation, AAAI 2021

    • https://arxiv.org/abs/2012.06048
    • Fei Yuan, Linjun Shou, Jian Pei, Wutao Lin, Ming Gong, Yan Fu, Daxin Jiang
  • Confidence-Aware Multi-Teacher Knowledge Distillation, ICASSP 2022 [Pytorch]

    • https://arxiv.org/abs/2201.00007
    • Hailin Zhang, Defang Chen, Can Wang.
  • Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning, ICME 2023 [Pytorch]

    • https://arxiv.org/abs/2306.06634
    • Hailin Zhang, Defang Chen, Can Wang.

Data-Free Knowledge Distillation

  • Data-Free Knowledge Distillation for Deep Neural Networks, NeurIPS-workshop 2017 [Tensorflow]

    • https://arxiv.org/abs/1710.07535
    • Raphael Gontijo Lopes, Stefano Fenu, Thad Starner
  • DAFL: Data-Free Learning of Student Networks, ICCV 2019 [PyTorch]

    • https://arxiv.org/abs/1904.01186
    • Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian
  • Zero-Shot Knowledge Distillation in Deep Networks, ICML 2019 [Tensorflow]

    • https://arxiv.org/abs/1905.08114
    • Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu, Anirban Chakraborty
  • Zero-shot Knowledge Transfer via Adversarial Belief Matching, NeurIPS 2019 [Pytorch]

    • https://arxiv.org/abs/1905.09768
    • Paul Micaelli, Amos Storkey
  • Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion, CVPR 2020 [Pytorch]

    • https://arxiv.org/abs/1912.08795
    • Hongxu Yin, Pavlo Molchanov, Zhizhong Li, Jose M. Alvarez, Arun Mallya, Derek Hoiem, Niraj K. Jha, Jan Kautz
  • The Knowledge Within: Methods for Data-Free Model Compression, CVPR 2020

    • https://arxiv.org/abs/1912.01274
    • Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry
  • Contrastive Model Inversion for Data-Free Knowledge Distillation, IJCAI 2021 [Pytorch]

    • https://arxiv.org/abs/2105.08584
    • Gongfan Fang, Jie Song, Xinchao Wang, Chengchao Shen, Xingen Wang, Mingli Song
  • Customizing Synthetic Data for Data-Free Student Learning, ICME 2023 [Pytorch]

    • https://arxiv.org/abs/2307.04542
    • Shiya Luo, Defang Chen, Can Wang

Distillation for Segmentation

  • Structured Knowledge Distillation for Dense Prediction, CVPR 2019, TPAMI 2020 [Pytorch]

    • https://arxiv.org/abs/1903.04197
    • Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen.
  • Channel-wise Knowledge Distillation for Dense Prediction, ICCV 2021 [Pytorch]

    • https://arxiv.org/abs/2011.13256
    • Changyong Shu, Yifan Liu, Jianfei Gao, Zheng Yan, Chunhua Shen.
  • Cross-Image Relational Knowledge Distillation for Semantic Segmentation, CVPR 2022 [Pytorch]

    • https://arxiv.org/abs/2204.06986
    • Chuanguang Yang, Helong Zhou, Zhulin An, Xue Jiang, Yongjun Xu, Qian Zhang.
  • Holistic Weighted Distillation for Semantic Segmentation, ICME 2023 [Pytorch]

    • Wujie Sun, Defang Chen, Can Wang, Deshi Ye, Yan Feng, Chun Chen.

Useful Resources

  • Acceptance rates of the main AI conferences [Link]
  • AI conference deadlines [Link]
  • CCF conference deadlines [Link]