This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspired by Embedded-Neural-Network.)

You can use the following materials as your entrypoint:

Efficient Processing of Deep Neural Networks: A Tutorial and Survey
the related work of Quantized Neural Networks

Terminologies

Structural pruning (compression): compress CNNs based on removing "less important" filter.

Network Compression

Reduce Precision

Deep neural networks are robust to weight binarization and other non-linear distortions showed that DNN can be robust to more than just weight binarization.

Linear Quantization

Fixed point
- [1502]. Deep Learning with Limited Numerical Precision
- [1610]. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks
Dynamic fixed point
Binary Quantization
- Theory proof (EBP)
- More practice with 1 bit
- XNOR-Net with slightly large bits (1~2 bit)
  - [1606]. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
  - [1608]. Recurrent Neural Networks With Limited Numerical Precision
  - [1609]. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. (Text overlap with Binarized Neural Network.)
  - [1702]. Deep Learning with Low Precision by Half-wave Gaussian Quantization
Ternary Quantization
- [1410]. Fixed-point feedforward deep neural network design using weights +1, 0, and -1
- [1605]. Ternary Weight Networks
- [1612]. Trained Ternary Quantization
Other Quantization or others
- [1412]. Compressing Deep Convolutional Networks using Vector Quantization
1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs
Towards the Limit of Network Quantization.
Loss-aware Binarization of Deep Networks.

Non-linear Quantization

Log Domain Quantization
Parameter Sharing
- Structured Matrices
  - Structured Convolution Matrices for Energy-efficient Deep learning.
  - Structured Transforms for Small-Footprint Deep Learning.
  - An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections.
  - Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank.
- Hashing
  - [1504]. Compressing neural networks with the hashing trick
  - Functional Hashing for Compressing Neural Networks
- [1510]. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding
- Learning compact recurrent neural networks.

Reduce Number of Operations and Model Size

Exploiting Activation Statistics

To be updated.

Network Pruning

Network Prune: a large amount of the weights in a network are redundant and can be removed (i.e., set to zero).

Remove low saliency
- [9006]. Optimal Brain Damage
- [1506]. Learning both weights and connections for efficient neural network
Energy-based prune
- [1611]. Designing energy-efficient convolutional neural networks using energy-aware pruning
Process sparse weights
Structured pruning

Bayesian network pruning

[1711]. Interpreting Convolutional Neural Networks Through Compression - [notes][arXiv]
[1705]. Structural compression of convolutional neural networks based on greedy filter pruning - [notes][arXiv]

Compact Network Architectures

Before Training
- use 1*1 convolutional layer to reduce the number of channels
- Bottleneck:
  - [1312]. Network in network
  - [1409]. Going deeper with convolutions
  - [1512]. Deep residual learning for image recognition
  - [1602]. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size
After Training
- Canonical Polyadic (CP) decomposition
  - [1404]. Exploiting linear structure within convolutional networks for efficient evaluation
  - [1412]. Speeding-up convolutional neural networks using fine-tuned cp-decomposition
- Tucker decomposition
  - [1511]. Compression of deep convolutional neural networks for fast and low power mobile applications

Knowledge Distillation

[0600]. Model compression
[1312]. Do deep nets really need to be deep?
[1412]. Fitnets: Hints for thin deep nets
[1503]. Distilling the knowledge in a neural network
Sequence-Level Knowledge Distillation.
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer.

A Bit Hardware

[1402]. Computing's Energy Porblem (and what we can do about it)

compress-net-notes
compress-net-notes copied to clipboard

Metadata

Terminologies

Network Compression

Reduce Precision

Linear Quantization

Non-linear Quantization

Reduce Number of Operations and Model Size

Exploiting Activation Statistics

Network Pruning

Bayesian network pruning

Compact Network Architectures

Knowledge Distillation

A Bit Hardware

Contributors

← Metadata

Owner

Metadata

compress-net-notes compress-net-notes copied to clipboard

Metadata

Terminologies

Network Compression

Reduce Precision

Linear Quantization

Non-linear Quantization

Reduce Number of Operations and Model Size

Exploiting Activation Statistics

Network Pruning

Bayesian network pruning

Compact Network Architectures

Knowledge Distillation

A Bit Hardware

Contributors

← Metadata

Owner

Metadata

compress-net-notes
compress-net-notes copied to clipboard