title: Compressed Neural Network date: 2017-10-30 18:16:32 tags: mathjax: true

Quantized Neural Network

low-precision Quantization

quantized weights only

replace float-precision weights by low-precision or n-bit weights

Matthieu-Courbariaux,Yoshua-Bengio,Jean-Pierre-David:"BinaryConnect: Training Deep Neural Networks with binary weights during propagations." [NIPS 2015]
Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng:Quantized Convolutional Neural Networks for Mobile Devices[CVPR 2016]
- quantize conv weights -> fine-tune fc weights -> quantize fc weights
- mobile device runtime and memory size.
Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally: Trained Ternary Quantization[ICLR 2017]
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, Yurong Chen: "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights."[ICLR 2017]
- quantize part of weights, fine-tune rest float weight util all weights are quatized.
- 53$\times$ compression rate in companion with DNS.
Yiwen Guo, Anbang Yao, Hao Zhao, Yurong Chen:Network Sketching: Exploiting Binary Structure in Deep CNNs[CVPR 2017]
- Residual Quantization and Residual Quantization with refinement
Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides: Local Binary Convolutional Neural Networks. [CVPR 2017]
- shared Ternary weights tensor with filter-wise scale.
- less trainable parameters to deal overfitting.
Training Quantized Nets: A Deeper Understanding [NIPS 2017]
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization [BMVC 2017]
LEARNING DISCRETE WEIGHTS USING THE LOCAL REPARAMETERIZATION TRICK[ICLR 2018]
- Varitional method
An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks[ICML 2018]
- Control-Theory deal with Discrete-Weight Optimization。
Deep Neural Network Compression with Single and Multiple Level Quantization[AAAI 2018] * loss-aware weights group partition to low-precison and full-precison. * Layer seperable quantization to Global quantization。
Cong Leng, Zesheng Dou, Hao Li, Shenghuo Zhu, Rong Jin:Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM[AAAI 2018] * ADMM, constrain add to loss func.
LOSS-AWARE WEIGHT QUANTIZATION OF DEEP NET- WORKS [ICLR 2018] * projected-newton method * keep float weights in training
PROXQUANT: QUANTIZED NEURAL NETWORKS VIA PROXIMAL OPERATORS [ICLR 2019]
LEARNING RECURRENT BINARY/TERNARY WEIGHTS [ICLR 2019] * Add BatchNorm to LSTM for Quantized weights
Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation [AAAI 2019]
Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation [CVPR 2019]
Rate Distortion For Model Compression: From Theory To Practice [ICML 2019]
Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization [ICML 2019]

quantized weights and activations

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio: "Binarized Neural Networks." [NIPS 2016]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi: "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." [ECCV 2016]
- Activation: Slice-Wise Scale。
- Weight: Filter-Wise Scale。
- AlexNet 44.2% Top1 , 69.2% Top5
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos: "Deep Learning with Low Precision by Half-wave Gaussian Quantization."[CVPR 2017]
Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, Yuheng Zou: "DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients"[arXiv' 2016]
- tanh + min-max transforms weights to [0,1], then quantize
- clip [0,1] trainsform activation to [0,1], then quantize
Zefan Li, Bingbing Ni, Wenjun Zhang, Xiaokang Yang, Wen Gao: Performance Guaranteed Network Acceleration via High-Order Residual Quantization.[ICCV 2017]
- Residual quantize activation.
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos: Deep Learning with Low Precision by Half-Wave Gaussian Quantization. [CVPR 2017]
- approx activation to (0,1)-Gaussian, then get quantize values by this std gaussian distribution.
Wei Tang, Gang Hua, Liang Wang:How to Train a Compact Binary Neural Network with High Accuracy?[AAAI 2017]
- Residual quantize activation
- PReLU replace Relu
Wei Pan,Xiaofan Lin,Cong Zhao Towards Accurate Binary Convolutional Neural Network[NIPS 2017]
ALTERNATING MULTI-BIT QUANTIZATION FOR RECURRENT NEURAL NETWORKS [ICLR 2018]
- Alternative Quantizetion to minimize quantization error.
- RNN only
SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks [CVPR 2018]
Towards Effective Low-bitwidth Convolutional Neural Networks [CVPR 2018]
- Train BNN for 32bit to 16bit to 8bit to 4bit to 2bit
Two-Step Quantization for Low-bit Neural Networks[CVPR 2018]
- quantization activation -> get init quantized weights -> fine-tune quantized weights
Learning Low Precision Deep Neural Networks through Regularization [arXiv 2018]
- Add Quantization Regularization item
Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm [ECCV 2018]
- add a extra shortcut in between two conv in one res-block.
TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights [ECCV 2018]
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks [ECCV 2018]
- Alternative Quantizetion to minimize quantization error.
- CNN.
Towards Binary-Valued Gates for Robust LSTM Training [ICML 2018]
- Gumbel-Softmax to get binarized LSTM Gate
Heterogeneous Bitwidth Binarization in Convolutional Neural Networks [NIPS 2018]
HitNet: Hybrid Ternary Recurrent Neural Network [NIPS 2018]
RELAXED QUANTIZATION FOR DISCRETIZED NEURAL NETWORKS [ICLR 2019]
- Gumbel-Softmax STE to trained discrete weights and activation
DEFENSIVE QUANTIZATION: WHEN EFFICIENCY MEETS ROBUSTNESS [ICLR 2019]
- leverage Quantized activation to boost defense
A SYSTEMATIC STUDY OF BINARY NEURAL NETWORKS’ OPTIMISATION [ICLR 2019]
- How to Train BNN. Hyper-Parameter settings
A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks [CVPR 2019]
- Prune BNN with mask trained
Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? [CVPR 2019]
Training Quantized Network with Auxiliary Gradient Module [arXiv 2019]
- Likely Knowledge Distill / Attention Transfer. Add Auxiliary Loss to every layer
HAQ: Hardware-Aware Automated Quantization [CVPR 2019]
- Reinforce Learning to allocate different bits for different layers
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation [CVPR 2019]
Regularizing Activation Distribution for Training Binarized Deep Networks [CVPR 2019]
- Regularizing Batch-Normlize output to the scope of [-1,1]
Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss [CVPR 2019]
- Learn Quantized funtions intervals
Matrix and tensor decompositions for training binary neural networks [arXiv 2019]
- Add capacity of float weight to get more accurate binary weight
Learning low-precision neural networks without Straight-Through Estimator (STE) [IJCAI 2019]
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting [ICML 2019]
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks [ICCV 2019]

Gradient Quantization && Distributed Training

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning [NIPS 2017]
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding [NIPS 2017]
Value-aware Quantization for Training and Inference of Neural Networks [ECCV 2018]

Quantize weights && activation && gradients Simultaneously

Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks [NIPS 2017]
TRAINING AND INFERENCE WITH INTEGERS IN DEEP NEURAL NETWORKS [ICLR 2018]
Training Deep Neural Networks with 8-bit Floating Point Numbers [NIPS 2018]
Training DNNs with Hybrid Block Floating Point [NIPS 2018]
ANALYSIS OF QUANTIZED DEEP NETWORKS [ICLR 2019]

Weight-Sharing Quantization

A group of weights sharing one value

Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen:Compressing Neural Networks with the Hashing Trick. [ICML 2015]
Compressing Convolutional Neural Networks in the Frequency Domain [KDD 2016]
- DCT transform to Frequency Domain.
Song Han, Huizi Mao, William J. Dally: Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding[ICLR 2016 Best paper]
Karen Ullrich, Edward Meeds, Max Welling: Soft Weight-Sharing for Neural Network Compression.[ICLR 2017]
TOWARDS THE LIMIT OF NETWORK QUANTIZATION [ICLR 2017]
Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions [ICML 2018]
VARIATIONAL NETWORK QUANTIZATION [ICLR 2018]
WSNet: Compact and Efficient Networks Through Weight Sampling [ICML 2018]
Clustering Convolutional Kernels to Compress Deep Neural Networks [ECCV 2018]
Coreset-Based Neural Network Compression [ECCV 2018]
Learning Versatile Filters for Efficient Convolutional Neural Networks [NIPS 2018]
Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression [CVPR 2019]
LegoNet: Efficient Convolutional Neural Networks with Lego Filters [ICML 2019]

Pruning Neural Network

Dong Yu,Frank Seide,Gang Li, Li Deng:EXPLOITING SPARSENESS IN DEEP NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION[ICASSP 2012]
Song Han, Jeff Pool, John Tran, William J. Dally: Learning both Weights and Connections for Efficient Neural Networks.[NIPS 2015]
- Alexnet prune 89% parameters，VGG prune 92.5% parameters
Fast ConvNets Using Group-wise Brain Damage [CVPR 2016]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li: Learning Structured Sparsity in Deep Neural Networks. [NIPS 2016]
- Real CPU/GPU accerate with Group Sparsity/
Yiwen Guo, Anbang Yao, Yurong Chen: Dynamic Network Surgery for Efficient DNNs. [NIPS 2016]
- AlexNet 17.7x compression rate
- less training epochs
Variational Dropout Sparsifies Deep Neural Networks [ICML 2017]
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz NVIDIA:PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE[ICLR 2017]
Jian-Hao Luo, Jianxin Wu, Weiyao Lin: ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression.[ICCV 2017]
- layer by layer prune and fine-tune.
Learning Efficient Convolutional Networks through Network Slimming [ICCV 2017]
- Reg BN scale with L1-norm to prune channel
Runtime Neural Pruning [NIPS 2017]
Structured Bayesian Pruning via Log-Normal Multiplicative Noise [NIPS 2017]
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon [NIPS 2017]
Exploring the Regularity of Sparse Structure in Convolutional Neural Networks [NIPS 2017]
Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks [WACV 2018]
RETHINKING THE SMALLER-NORM-LESS-INFORMATIVE ASSUMPTION IN CHANNEL PRUNING OF CONVOLUTION LAYERS [ICLR 2018]
LEARNING TO SHARE: SIMULTANEOUS PARAMETER TYING AND SPARSIFICATION IN DEEP LEARNING [ICLR 2018]
LEARNING SPARSE NEURAL NETWORKS THROUGH L0 REGULARIZATION [ICLR 2018]
“Learning-Compression” Algorithms for Neural Net Pruning [CVPR 2018]
PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning [CVPR 2018]
NestedNet: Learning Nested Sparse Structures in Deep Neural Networks [CVPR 2018]
Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks [IJCAI 2018]
A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers [ECCV 2018]
Data-Driven Sparse Structure Selection for Deep Neural Network [ECCV 2018]
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications [ECCV 2018]
AMC: AutoML for Model Compression and Acceleration on Mobile Devices [ECCV 2018]
Discrimination-aware Channel Pruning for Deep Neural Networks [NIPS 2018]
Synaptic Strength For Convolutional Neural Network [NIPS 2018]
Learning Sparse Neural Networks via Sensitivity-Driven Regularization [NIPS 2018]
Frequency-Domain Dynamic Pruning for Convolutional Neural Networks [NIPS 2018]
TETRIS: TilE-matching the TRemendous Irregular Sparsity [NIPS 2018]
Balanced Sparsity for Efficient DNN Inference on GPU [AAAI 2019]
Dynamic Channel Pruning: Feature Boosting and Suppression [ICLR 2019]
SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY [ICLR 2019]
ENERGY-CONSTRAINED COMPRESSION FOR DEEP NEURAL NETWORKS VIA WEIGHTED SPARSE PROJEC- TION AND LAYER INPUT MASKING [ICLR 2019]
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks [ICLR 2019]
Dynamic Channel Pruning: Feature Boosting and Suppression [ICLR 2019]
RePr: Improved Training of Convolutional Filters [CVPR 2019]
Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration [CVPR 2019]
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure [CVPR 2019]
Fully Learnable Group Convolution for Acceleration of Deep Neural Networks [CVPR 2019]
On Implicit Filter Level Sparsity in Convolutional Neural Networks [CVPR 2019]
ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model [CVPR 2019]
Towards Optimal Structured CNN Pruning via Generative Adversarial Learning [CVPR 2019]
Cascaded Projection: End-to-End Network Compression and Acceleration [CVPR 2019]
Importance Estimation for Neural Network Pruning [CVPR 2019]
SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization [CVPR 2019]
LeGR: Filter Pruning via Learned Global Ranking [arXiv 2019] (code)
Approximated Oracle Filter Pruning for Destructive CNN Width Optimization [ICML 2019]
Collaborative Channel Pruning for Deep Networks [ICML 2019]
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning [ICCV 2019]
Co-Evolutionary Compression for Unpaired Image Translation [ICCV 2019]

Matrix Decomposition

Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus: Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation.[NIPS 2014]
- Alexnet 2.5x CPU times
Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan V. Oseledets, Victor S. Lempitsky: Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition.[ICLR 2015]
- CP-decom decompose a layer to four light-head layers.
Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E:CONVOLUTIONAL NEURAL NETWORKS WITH LOW- RANK REGULARIZATION[ICLR 2016]
- Decompose a 2d-conv to two 1d-convs
Accelerating Very Deep Convolutional Networks for Classification and Detection [TPAMI 2016]
Tensor-Train Recurrent Neural Networks for Video Classification [ICML 2017]
Domain-adaptive deep network compression [ICCV 2017]
Coordinating Filters for Faster Deep Neural Networks [ICCV 2017]
Jose M. Alvarez,Mathieu Salzmann:[Compression-aware Training of Deep Networks [NIPS 2017]
- Decompose a 2d-conv to two 1d-convs, add activaiton between two layers
DCFNet: Deep Neural Network with Decomposed Convolutional Filters [ICML 2018]
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition [CVPR 2018]
Wide Compression: Tensor Ring Nets [CVPR 2018]
Extreme Network Compression via Filter Group Approximation [ECCV 2018]
Trained Rank Pruning for Efficient Deep Neural Networks [arXiv 2018]
Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition [AAAI 2019]
ROTDCF: DECOMPOSITION OF CONVOLUTIONAL FILTERS FOR ROTATION-EQUIVARIANT DEEP NETWORKS [ICLR 2019]
T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor [CVPR 2019]

Knowledge Distill

Distilling the Knowledge in a Neural Network [2014]
FITNETS: HINTS FOR THIN DEEP NETS [ICLR 2015]
PAYING MORE ATTENTION TO ATTENTION:IMPROVING THE PERFORMANCE OF CONVOLUTIONAL NEURAL NETWORKS VIA ATTENTION TRANSFER [ICLR 2017]
Mimicking Very Efficient Network for Object Detection [CVPR 2017]
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning [CVPR 2017]
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer [AAAI 2018]
Deep Mutual Learning [CVPR 2018]
Data Distillation: Towards Omni-Supervised Learning [CVPR 2018]
Quantization Mimic: Towards Very Tiny CNNfor Object Detection [ECCV 2018]
Self-supervised Knowledge Distillation Using Singular Value Decomposition [ECCV 2018]
KDGAN: Knowledge Distillation with Generative Adversarial Networks [NIPS 2018]
Knowledge Distillation by On-the-Fly Native Ensemble [NIPS 2018]
Paraphrasing Complex Network: Network Compression via Factor Transfer [NIPS 2018]
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons [AAAI 2019]
Relational Knowledge Distillation [CVPR 2019]
Knowledge Distillation via Instance Relationship Graph [CVPR 2019]
Snapshot Distillation: Teacher-Student Optimization in One Generation [CVPR 2019]
Learning Metrics from Teachers: Compact Networks for Image Embedding [CVPR 2019]
LIT: Learned Intermediate Representation Training for Model Compression [ICML 2019]
Correlation Congruence for Knowledge Distillation [ICCV 2019]
Similarity-Preserving Knowledge Distillation [ICCV 2019]
Learning Lightweight Lane Detection CNNs by Self Attention Distillation [ICCV 2019]

Compact Model

Efficient CNN

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size [arXiv 2016]
Xception: Deep Learning with Depthwise Separable Convolutions [CVPR 2017]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv 2017]
- depth-wise conv & point-wise conv
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [CVPR 2018]
- group-wise conv & channel shuffle
Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions [CVPR 2018]
MobileNetV2: Inverted Residuals and Linear Bottlenecks [CVPR 2018]
CondenseNet: An Efficient DenseNet using Learned Group Convolutions [CVPR 2018]
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [ECCV 2018]
Sparsely Aggregated Convolutional Networks [ECCV 2018]
Convolutional Networks with Adaptive Inference Graphs [ECCV 2018]
Real-Time MDNet [ECCV 2018]
ICNet for Real-Time Semantic Segmentation on High-Resolution Images [ECCV 2018]
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation [ECCV 2018]
Constructing Fast Network through Deconstruction of Convolution [NIPS 2018]
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions [NIPS 2018]
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs [CVPR 2019]
Adaptively Connected Neural Networks [CVPR 2019]
DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation [CVPR 2019]
All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification [CVPR 2019]
Searching for MobileNetV3 [arXiv 2019]
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [ICML 2019]
Efficient On-Device Models using Neural Projections [ICML 2019]
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution [ICCV 2019]
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks [ICCV 2019]

Efficient RNN variants

LightRNN: Memory and Computation-Efficient Recurrent Neural Networks [NIPS 2016]
QUASI-RECURRENT NEURAL NETWORKS [ICLR 2017]
COMPRESSING WORD EMBEDDINGS VIA DEEP COMPOSITIONAL CODE LEARNING [ICLR 2018]
Simple Recurrent Units for Highly Parallelizable Recurrence [EMNLP 2018]
Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices [NIPS 2018]

NAS

DARTS: DIFFERENTIABLE ARCHITECTURE SEARCH [ICLR 2019]

Compression_Paper
Compression_Paper copied to clipboard

Metadata

title: Compressed Neural Network date: 2017-10-30 18:16:32 tags: mathjax: true

Quantized Neural Network

low-precision Quantization

quantized weights only

quantized weights and activations

Gradient Quantization && Distributed Training

Quantize weights && activation && gradients Simultaneously

Weight-Sharing Quantization

Pruning Neural Network

Matrix Decomposition

Knowledge Distill

Compact Model

Efficient CNN

Efficient RNN variants

NAS

← Metadata

Owner

Metadata

Compression_Paper Compression_Paper copied to clipboard

Metadata

title: Compressed Neural Network date: 2017-10-30 18:16:32 tags: mathjax: true

Quantized Neural Network

low-precision Quantization

quantized weights only

quantized weights and activations

Gradient Quantization && Distributed Training

Quantize weights && activation && gradients Simultaneously

Weight-Sharing Quantization

Pruning Neural Network

Matrix Decomposition

Knowledge Distill

Compact Model

Efficient CNN

Efficient RNN variants

NAS

← Metadata

Owner

Metadata

Compression_Paper
Compression_Paper copied to clipboard