Compression_Paper
Compression_Paper copied to clipboard
title: Compressed Neural Network date: 2017-10-30 18:16:32 tags: mathjax: true
Quantized Neural Network
low-precision Quantization
quantized weights only
replace float-precision weights by low-precision or n-bit weights
-
Matthieu-Courbariaux,Yoshua-Bengio,Jean-Pierre-David:"BinaryConnect: Training Deep Neural Networks with binary weights during propagations." [NIPS 2015]
-
Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng:Quantized Convolutional Neural Networks for Mobile Devices[CVPR 2016]
- quantize conv weights -> fine-tune fc weights -> quantize fc weights
- mobile device runtime and memory size.
-
Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally: Trained Ternary Quantization[ICLR 2017]
-
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, Yurong Chen: "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights."[ICLR 2017]
- quantize part of weights, fine-tune rest float weight util all weights are quatized.
- 53$\times$ compression rate in companion with DNS.
-
Yiwen Guo, Anbang Yao, Hao Zhao, Yurong Chen:Network Sketching: Exploiting Binary Structure in Deep CNNs[CVPR 2017]
- Residual Quantization and Residual Quantization with refinement
-
Felix Juefei-Xu, Vishnu Naresh Boddeti, Marios Savvides: Local Binary Convolutional Neural Networks. [CVPR 2017]
- shared Ternary weights tensor with filter-wise scale.
- less trainable parameters to deal overfitting.
-
Training Quantized Nets: A Deeper Understanding [NIPS 2017]
-
Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization [BMVC 2017]
-
LEARNING DISCRETE WEIGHTS USING THE LOCAL REPARAMETERIZATION TRICK[ICLR 2018]
- Varitional method
-
An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks[ICML 2018]
- Control-Theory deal with Discrete-Weight Optimization。
-
Deep Neural Network Compression with Single and Multiple Level Quantization[AAAI 2018] * loss-aware weights group partition to low-precison and full-precison. * Layer seperable quantization to Global quantization。
-
Cong Leng, Zesheng Dou, Hao Li, Shenghuo Zhu, Rong Jin:Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM[AAAI 2018] * ADMM, constrain add to loss func.
-
LOSS-AWARE WEIGHT QUANTIZATION OF DEEP NET- WORKS [ICLR 2018] * projected-newton method * keep float weights in training
-
PROXQUANT: QUANTIZED NEURAL NETWORKS VIA PROXIMAL OPERATORS [ICLR 2019]
-
LEARNING RECURRENT BINARY/TERNARY WEIGHTS [ICLR 2019] * Add BatchNorm to LSTM for Quantized weights
-
Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation [AAAI 2019]
-
Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation [CVPR 2019]
-
Rate Distortion For Model Compression: From Theory To Practice [ICML 2019]
-
Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization [ICML 2019]
quantized weights and activations
-
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio: "Binarized Neural Networks." [NIPS 2016]
-
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi: "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." [ECCV 2016]
- Activation: Slice-Wise Scale。
- Weight: Filter-Wise Scale。
- AlexNet 44.2% Top1 , 69.2% Top5
-
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos: "Deep Learning with Low Precision by Half-wave Gaussian Quantization."[CVPR 2017]
-
Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, Yuheng Zou: "DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients"[arXiv' 2016]
- tanh + min-max transforms weights to [0,1], then quantize
- clip [0,1] trainsform activation to [0,1], then quantize
-
Zefan Li, Bingbing Ni, Wenjun Zhang, Xiaokang Yang, Wen Gao: Performance Guaranteed Network Acceleration via High-Order Residual Quantization.[ICCV 2017]
- Residual quantize activation.
-
Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos: Deep Learning with Low Precision by Half-Wave Gaussian Quantization. [CVPR 2017]
- approx activation to (0,1)-Gaussian, then get quantize values by this std gaussian distribution.
-
Wei Tang, Gang Hua, Liang Wang:How to Train a Compact Binary Neural Network with High Accuracy?[AAAI 2017]
- Residual quantize activation
- PReLU replace Relu
-
Wei Pan,Xiaofan Lin,Cong Zhao Towards Accurate Binary Convolutional Neural Network[NIPS 2017]
-
ALTERNATING MULTI-BIT QUANTIZATION FOR RECURRENT NEURAL NETWORKS [ICLR 2018]
- Alternative Quantizetion to minimize quantization error.
- RNN only
-
SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks [CVPR 2018]
-
Towards Effective Low-bitwidth Convolutional Neural Networks [CVPR 2018]
- Train BNN for 32bit to 16bit to 8bit to 4bit to 2bit
-
Two-Step Quantization for Low-bit Neural Networks[CVPR 2018]
- quantization activation -> get init quantized weights -> fine-tune quantized weights
-
Learning Low Precision Deep Neural Networks through Regularization [arXiv 2018]
- Add Quantization Regularization item
-
Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm [ECCV 2018]
- add a extra shortcut in between two conv in one res-block.
-
TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights [ECCV 2018]
-
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks [ECCV 2018]
- Alternative Quantizetion to minimize quantization error.
- CNN.
-
Towards Binary-Valued Gates for Robust LSTM Training [ICML 2018]
- Gumbel-Softmax to get binarized LSTM Gate
-
Heterogeneous Bitwidth Binarization in Convolutional Neural Networks [NIPS 2018]
-
HitNet: Hybrid Ternary Recurrent Neural Network [NIPS 2018]
-
RELAXED QUANTIZATION FOR DISCRETIZED NEURAL NETWORKS [ICLR 2019]
- Gumbel-Softmax STE to trained discrete weights and activation
-
DEFENSIVE QUANTIZATION: WHEN EFFICIENCY MEETS ROBUSTNESS [ICLR 2019]
- leverage Quantized activation to boost defense
-
A SYSTEMATIC STUDY OF BINARY NEURAL NETWORKS’ OPTIMISATION [ICLR 2019]
- How to Train BNN. Hyper-Parameter settings
-
A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks [CVPR 2019]
- Prune BNN with mask trained
-
Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? [CVPR 2019]
-
Training Quantized Network with Auxiliary Gradient Module [arXiv 2019]
- Likely Knowledge Distill / Attention Transfer. Add Auxiliary Loss to every layer
-
HAQ: Hardware-Aware Automated Quantization [CVPR 2019]
- Reinforce Learning to allocate different bits for different layers
-
Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation [CVPR 2019]
-
Regularizing Activation Distribution for Training Binarized Deep Networks [CVPR 2019]
- Regularizing Batch-Normlize output to the scope of [-1,1]
-
Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss [CVPR 2019]
- Learn Quantized funtions intervals
-
Matrix and tensor decompositions for training binary neural networks [arXiv 2019]
- Add capacity of float weight to get more accurate binary weight
-
Learning low-precision neural networks without Straight-Through Estimator (STE) [IJCAI 2019]
-
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting [ICML 2019]
-
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks [ICCV 2019]
Gradient Quantization && Distributed Training
- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning [NIPS 2017]
- QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding [NIPS 2017]
- Value-aware Quantization for Training and Inference of Neural Networks [ECCV 2018]
Quantize weights && activation && gradients Simultaneously
- Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks [NIPS 2017]
- TRAINING AND INFERENCE WITH INTEGERS IN DEEP NEURAL NETWORKS [ICLR 2018]
- Training Deep Neural Networks with 8-bit Floating Point Numbers [NIPS 2018]
- Training DNNs with Hybrid Block Floating Point [NIPS 2018]
- ANALYSIS OF QUANTIZED DEEP NETWORKS [ICLR 2019]
Weight-Sharing Quantization
A group of weights sharing one value
-
Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, Yixin Chen:Compressing Neural Networks with the Hashing Trick. [ICML 2015] - Compressing Convolutional Neural Networks in the Frequency Domain [KDD 2016]
- DCT transform to Frequency Domain.
- Song Han, Huizi Mao, William J. Dally: Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding[ICLR 2016 Best paper]
- Karen Ullrich, Edward Meeds, Max Welling: Soft Weight-Sharing for Neural Network Compression.[ICLR 2017]
- TOWARDS THE LIMIT OF NETWORK QUANTIZATION [ICLR 2017]
- Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions [ICML 2018]
- VARIATIONAL NETWORK QUANTIZATION [ICLR 2018]
- WSNet: Compact and Efficient Networks Through Weight Sampling [ICML 2018]
- Clustering Convolutional Kernels to Compress Deep Neural Networks [ECCV 2018]
- Coreset-Based Neural Network Compression [ECCV 2018]
- Learning Versatile Filters for Efficient Convolutional Neural Networks [NIPS 2018]
- Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression [CVPR 2019]
- LegoNet: Efficient Convolutional Neural Networks with Lego Filters [ICML 2019]
Pruning Neural Network
- Dong Yu,Frank Seide,Gang Li, Li Deng:EXPLOITING SPARSENESS IN DEEP NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION[ICASSP 2012]
-
Song Han, Jeff Pool, John Tran, William J. Dally:
Learning both Weights and Connections for Efficient Neural Networks.[NIPS 2015]
- Alexnet prune 89% parameters,VGG prune 92.5% parameters
- Fast ConvNets Using Group-wise Brain Damage [CVPR 2016]
- Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li: Learning Structured Sparsity in Deep Neural Networks. [NIPS 2016]
- Real CPU/GPU accerate with Group Sparsity/
-
Yiwen Guo, Anbang Yao, Yurong Chen:
Dynamic Network Surgery for Efficient DNNs. [NIPS 2016]
- AlexNet 17.7x compression rate
- less training epochs
- Variational Dropout Sparsifies Deep Neural Networks [ICML 2017]
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, Jan Kautz NVIDIA:PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE[ICLR 2017]
-
Jian-Hao Luo, Jianxin Wu, Weiyao Lin:
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression.[ICCV 2017]
- layer by layer prune and fine-tune.
-
Learning Efficient Convolutional Networks through Network Slimming [ICCV 2017]
- Reg BN scale with L1-norm to prune channel
- Runtime Neural Pruning [NIPS 2017]
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise [NIPS 2017]
- Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon [NIPS 2017]
- Exploring the Regularity of Sparse Structure in Convolutional Neural Networks [NIPS 2017]
- Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks [WACV 2018]
- RETHINKING THE SMALLER-NORM-LESS-INFORMATIVE ASSUMPTION IN CHANNEL PRUNING OF CONVOLUTION LAYERS [ICLR 2018]
- LEARNING TO SHARE: SIMULTANEOUS PARAMETER TYING AND SPARSIFICATION IN DEEP LEARNING [ICLR 2018]
- LEARNING SPARSE NEURAL NETWORKS THROUGH L0 REGULARIZATION [ICLR 2018]
- “Learning-Compression” Algorithms for Neural Net Pruning [CVPR 2018]
- PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning [CVPR 2018]
- NestedNet: Learning Nested Sparse Structures in Deep Neural Networks [CVPR 2018]
- Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks [IJCAI 2018]
- A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers [ECCV 2018]
- Data-Driven Sparse Structure Selection for Deep Neural Network [ECCV 2018]
- NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications [ECCV 2018]
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices [ECCV 2018]
- Discrimination-aware Channel Pruning for Deep Neural Networks [NIPS 2018]
- Synaptic Strength For Convolutional Neural Network [NIPS 2018]
- Learning Sparse Neural Networks via Sensitivity-Driven Regularization [NIPS 2018]
- Frequency-Domain Dynamic Pruning for Convolutional Neural Networks [NIPS 2018]
- TETRIS: TilE-matching the TRemendous Irregular Sparsity [NIPS 2018]
- Balanced Sparsity for Efficient DNN Inference on GPU [AAAI 2019]
- Dynamic Channel Pruning: Feature Boosting and Suppression [ICLR 2019]
- SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY [ICLR 2019]
- ENERGY-CONSTRAINED COMPRESSION FOR DEEP NEURAL NETWORKS VIA WEIGHTED SPARSE PROJEC- TION AND LAYER INPUT MASKING [ICLR 2019]
- The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks [ICLR 2019]
- Dynamic Channel Pruning: Feature Boosting and Suppression [ICLR 2019]
- RePr: Improved Training of Convolutional Filters [CVPR 2019]
- Pruning Filter via Geometric Median for Deep Convolutional Neural Networks Acceleration [CVPR 2019]
- Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure [CVPR 2019]
- Fully Learnable Group Convolution for Acceleration of Deep Neural Networks [CVPR 2019]
- On Implicit Filter Level Sparsity in Convolutional Neural Networks [CVPR 2019]
- ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model [CVPR 2019]
- Towards Optimal Structured CNN Pruning via Generative Adversarial Learning [CVPR 2019]
- Cascaded Projection: End-to-End Network Compression and Acceleration [CVPR 2019]
- Importance Estimation for Neural Network Pruning [CVPR 2019]
- SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity through Low-Bit Quantization [CVPR 2019]
- LeGR: Filter Pruning via Learned Global Ranking [arXiv 2019] (code)
- Approximated Oracle Filter Pruning for Destructive CNN Width Optimization [ICML 2019]
- Collaborative Channel Pruning for Deep Networks [ICML 2019]
- MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning [ICCV 2019]
- Co-Evolutionary Compression for Unpaired Image Translation [ICCV 2019]
Matrix Decomposition
-
Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus:
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation.[NIPS 2014]
- Alexnet 2.5x CPU times
-
Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan V. Oseledets, Victor S. Lempitsky:
Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition.[ICLR 2015]
- CP-decom decompose a layer to four light-head layers.
-
Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E:CONVOLUTIONAL NEURAL NETWORKS WITH LOW- RANK REGULARIZATION[ICLR 2016]
- Decompose a 2d-conv to two 1d-convs
- Accelerating Very Deep Convolutional Networks for Classification and Detection [TPAMI 2016]
- Tensor-Train Recurrent Neural Networks for Video Classification [ICML 2017]
- Domain-adaptive deep network compression [ICCV 2017]
- Coordinating Filters for Faster Deep Neural Networks [ICCV 2017]
-
Jose M. Alvarez,Mathieu Salzmann:[Compression-aware Training of Deep Networks [NIPS 2017]
- Decompose a 2d-conv to two 1d-convs, add activaiton between two layers
- DCFNet: Deep Neural Network with Decomposed Convolutional Filters [ICML 2018]
- Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition [CVPR 2018]
- Wide Compression: Tensor Ring Nets [CVPR 2018]
- Extreme Network Compression via Filter Group Approximation [ECCV 2018]
- Trained Rank Pruning for Efficient Deep Neural Networks [arXiv 2018]
- Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition [AAAI 2019]
- ROTDCF: DECOMPOSITION OF CONVOLUTIONAL FILTERS FOR ROTATION-EQUIVARIANT DEEP NETWORKS [ICLR 2019]
- T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor [CVPR 2019]
Knowledge Distill
- Distilling the Knowledge in a Neural Network [2014]
- FITNETS: HINTS FOR THIN DEEP NETS [ICLR 2015]
- PAYING MORE ATTENTION TO ATTENTION:IMPROVING THE PERFORMANCE OF CONVOLUTIONAL NEURAL NETWORKS VIA ATTENTION TRANSFER [ICLR 2017]
- Mimicking Very Efficient Network for Object Detection [CVPR 2017]
- A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning [CVPR 2017]
- DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer [AAAI 2018]
- Deep Mutual Learning [CVPR 2018]
- Data Distillation: Towards Omni-Supervised Learning [CVPR 2018]
- Quantization Mimic: Towards Very Tiny CNNfor Object Detection [ECCV 2018]
- Self-supervised Knowledge Distillation Using Singular Value Decomposition [ECCV 2018]
- KDGAN: Knowledge Distillation with Generative Adversarial Networks [NIPS 2018]
- Knowledge Distillation by On-the-Fly Native Ensemble [NIPS 2018]
- Paraphrasing Complex Network: Network Compression via Factor Transfer [NIPS 2018]
- Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons [AAAI 2019]
- Relational Knowledge Distillation [CVPR 2019]
- Knowledge Distillation via Instance Relationship Graph [CVPR 2019]
- Snapshot Distillation: Teacher-Student Optimization in One Generation [CVPR 2019]
- Learning Metrics from Teachers: Compact Networks for Image Embedding [CVPR 2019]
- LIT: Learned Intermediate Representation Training for Model Compression [ICML 2019]
- Correlation Congruence for Knowledge Distillation [ICCV 2019]
- Similarity-Preserving Knowledge Distillation [ICCV 2019]
- Learning Lightweight Lane Detection CNNs by Self Attention Distillation [ICCV 2019]
Compact Model
Efficient CNN
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size [arXiv 2016]
- Xception: Deep Learning with Depthwise Separable Convolutions [CVPR 2017]
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv 2017]
- depth-wise conv & point-wise conv
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [CVPR 2018]
- group-wise conv & channel shuffle
- Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions [CVPR 2018]
- MobileNetV2: Inverted Residuals and Linear Bottlenecks [CVPR 2018]
- CondenseNet: An Efficient DenseNet using Learned Group Convolutions [CVPR 2018]
- ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design [ECCV 2018]
- Sparsely Aggregated Convolutional Networks [ECCV 2018]
- Convolutional Networks with Adaptive Inference Graphs [ECCV 2018]
- Real-Time MDNet [ECCV 2018]
- ICNet for Real-Time Semantic Segmentation on High-Resolution Images [ECCV 2018]
- BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation [ECCV 2018]
- Constructing Fast Network through Deconstruction of Convolution [NIPS 2018]
- ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions [NIPS 2018]
- HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs [CVPR 2019]
- Adaptively Connected Neural Networks [CVPR 2019]
- DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation [CVPR 2019]
- All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification [CVPR 2019]
- Searching for MobileNetV3 [arXiv 2019]
- EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [ICML 2019]
- Efficient On-Device Models using Neural Projections [ICML 2019]
- Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution [ICCV 2019]
- ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks [ICCV 2019]
Efficient RNN variants
- LightRNN: Memory and Computation-Efficient Recurrent Neural Networks [NIPS 2016]
- QUASI-RECURRENT NEURAL NETWORKS [ICLR 2017]
- COMPRESSING WORD EMBEDDINGS VIA DEEP COMPOSITIONAL CODE LEARNING [ICLR 2018]
- Simple Recurrent Units for Highly Parallelizable Recurrence [EMNLP 2018]
- Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices [NIPS 2018]
NAS
- DARTS: DIFFERENTIABLE ARCHITECTURE SEARCH [ICLR 2019]