awesome-model-compression icon indicating copy to clipboard operation
awesome-model-compression copied to clipboard

papers about model compression

awesome-model-compression

this collecting the papers (main from arxiv.org) about Model compression:
Structure;
Distillation;
Binarization;
Quantization;
Pruning;
Low Rank.

also, some papers and links collected from below, they are all awesome resources:


1990

1993

  • Hassibi, Babak, and David G. Stork. Second order derivatives for network pruning: Optimal brain surgeon .[C]Advances in neural information processing systems. 1993.
  • J. L. Holi and J. N. Hwang. [Finite precision error analysis of neural network hardware implementations]. In Ijcnn-91- Seattle International Joint Conference on Neural Networks, pages 519–525 vol.1, 1993.

1995

1997

1998

2000

2001

2006

2011

2012

  • D. Hammerstrom. [A vlsi architecture for highperformance, low-cost, on-chip learning]. In IJCNN International Joint Conference on Neural Networks, pages 537– 544 vol.2, 2012.

2013

2014

2015

2016

2017

2018

2019


Projects

  • NVIDIA TensorRT:  Programmable Inference Accelerator;  
  • Tencent/PocketFlow:  An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications;
  • dmlc/tvm:  Open deep learning compiler stack for cpu, gpu and specialized accelerators;
  • Tencent/ncnn:  ncnn is a high-performance neural network inference framework optimized for the mobile platform;
  • pytorch/glow:  Compiler for Neural Network hardware accelerators;
  • NervanaSystems/neon:  Intel® Nervana™ reference deep learning framework committed to best performance on all hardware;
  • NervanaSystems/distiller:  Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research;
  • MUSCO - framework for model compression using tensor decompositions (PyTorch)
  • OAID/Tengine:  Tengine is a lite, high performance, modular inference engine for embedded device;
  • fpeder/espresso:  Efficient forward propagation for BCNNs;
  • Tensorflow lite:  TensorFlow Lite is an open source deep learning framework for on-device inference.;  
  • Core ML:  Reduce the storage used by the Core ML model inside your app bundle;
  • pytorch-tensor-decompositions:  PyTorch implementation of [1412.6553] and [1511.06530] tensor decomposition methods for convolutional layers;
  • tensorflow/quantize:  
  • mxnet/quantization:  This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN or CUDNN.
  • TensoRT4-Example:  
  • NAF-tensorflow:  "Continuous Deep Q-Learning with Model-based Acceleration" in TensorFlow;
  • Mayo - deep learning framework with fine- and coarse-grained pruning, network slimming, and quantization methods
  • Keras compressor - compression using low-rank approximations, SVD for matrices, Tucker for tensors.
  • Caffe compressor K-means based quantization

others