awesome-dynamic-conditional-networks-cv icon indicating copy to clipboard operation
awesome-dynamic-conditional-networks-cv copied to clipboard

Overview of conditional computution and dynamic CNNs for computer vision, with a focus on reducing computational complexity

Awesome Dynamic Networks and Conditional Computation



Upcoming ICML 2022 on Dynamic Neural Networks! https://dynn-icml2022.github.io/ on Friday, July 22.



Overview of conditional computation and dynamic CNNs for computer vision, focusing on reducing computational cost of existing network architectures. In contrast to static networks, dynamic networks disable parts of the network based on the input image, at inference time. This can save computations and speed up inference, for example by processing easy images with fewer operations. Note that this list mainly focuses on methods reducing the computational cost of existing models (e.g. ResNet models), and does not list all methods that use dynamic computation for custom architectures.

This list is growing every day. If a method is missing or listed incorrectly, let me know by making a GitHub issue or pull request!

Here is a list with more static and dynamic methods for efficient CNNs.

Background

Methods have three important distinguishing factors:

  • The method's architecture, e.g. skipping layers or pixels, and whether these run-or-skip decisions are the result of a separate policy network, a submodule in the network or another mechanism.
  • The way of training the policy, e.g. using reinforcement learning, the gradient estimator such as Gumbel-Softmax or a custom approach.
  • The implementation of the method, and whether the method can be executed efficiently on existing platforms (i.e. whether the method speeds up inference, or only reduces the theoretical amount of computations)

Metrics: Most methods demonstrate performance with the reduction in computations (i.e. measured in floating point operations, FLOPS) compared to the loss in accuracy. Methods typically show figures where baseline models of different complexities (e.g. by reducing the number of channels) are compared to the method applied to the largest model with different cost savings.

Note that many works express computational complexity in FLOPS, even though the given numbers are actually multiply-accumulate operations (MACs), and GMACs = 0.5 * GFLOPs (see https://github.com/sovrasov/flops-counter.pytorch/issues/16 ). Some recent works therefore use GMAC instead of GFLOP to avoid ambiguity.

Tags used below: Note: tags are incomplete

  • VID: Video processing

Surveys / overviews

  • Dynamic Neural Networks: A Survey (Arxiv 2021) [pdf] Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, Yulin Wang

Methods

Depth-based methods

Early-exit methods have separate output branches to apply more or fewer layers.

  • BranchyNet: Fast inference via early exiting from deep neural networks (ICPR2016) [pdf] [chainer]
    Teerapittayanon S, McDanel B, Kung HT
  • Conditional Deep Learning for Energy-Efficient and Enhanced Pattern Recognition (DATE2016) [pdf]
    P. Panda, A. Sengupta, and K. Roy
  • Adaptive Neural Networks for Efficient Inference (ICML2017) [pdf] [GitHub no code]
    T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama
  • Dynamic computational time for visual attention (ICCV2017 workshop) [pdf] [torch lua]
    Li, Z., Yang, Y., Liu, X., Zhou, F., Wen, S. and Xu, W.
  • DynExit: A Dynamic Early-Exit Strategy for Deep Residual Networks (SiPS2019) [pdf]
    M. Wang, J. Mo, J. Lin, Z. Wang, and L. Du
  • Improved Techniques for Training Adaptive Deep Networks (ICCV2019) [pdf] [Pytorch]
    H. Li, H. Zhang, X. Qi, Y. Ruigang, and G. Huang
  • Early-exit convolutional neural networks (thesis 2019) [pdf]
    E. Demir
  • Efficient adaptive inference for deep convolutional neural networks using hierarchical early exits (Pattern Recognition 2020) [pdf]
    N. Passalis, J. Raitoharju, A. Tefas, and M. Gabbouj
  • Triple wins: Boosting accuracy, robustness and efficiency together by enabling input-adaptive inference (ICLR2020) [pdf] [pytorch]
    Hu TK, Chen T, Wang H, Wang Z.
  • FrameExit: Conditional Early Exiting for Efficient Video Recognition [pdf] Ghodrati, A., Bejnordi, B. E., & Habibian, A.
    [VID]

Skipping layers conditioned on the input image. For instance, easy images require fewer layers than complex ones:

  • Adaptive Computation Time for Recurrent Neural Networks (NIPS 2016 Deep Learning Symposium) [pdf] [unofficial pytorch]
    A. Graves
  • Convolutional Networks with Adaptive Inference Graphs (ECCV2018) [pdf] [Pytorch]
    A. Veit and S. Belongie
  • SkipNet: Learning Dynamic Routing in Convolutional Networks (ECCV2018) [pdf] [Pytorch]
    X. Wang, F. Yu, Z.-Y. Dou, T. Darrell, and J. E. Gonzalez
  • BlockDrop: Dynamic Inference Paths in Residual Networks (CVPR2018) [pdf] [Pytorch]
    Zuxuan Wu*, Tushar Nagarajan*, Abhishek Kumar, Steven Rennie, Larry S. Davis, Kristen Grauman, and Rogerio Feris
  • Dynamic Multi-path Neural Network (Arxiv2019) [pdf]
    Su, Y., Zhou, S., Wu, Y., Su, T., Liang, D., Liu, J., Zheng, D., Wang, Y., Yan, J. and Hu, X.
  • Energynet: Energy-efficient dynamic inference (2018) [pdf]
    Wang, Yue, et al.
  • Dual dynamic inference: Enabling more efficient, adaptive and controllable deep inference (IEEE Journal of Selected Topics in Signal Processing 2020) [pdf]
    Wang Y, Shen J, Hu TK, Xu P, Nguyen T, Baraniuk RG, Wang Z, Lin Y.
  • CoDiNet: Path Distribution Modeling with Consistency and Diversity for Dynamic Routing (TPAMI 2021) [pdf]

Executes some layers multiple times ('recursively') based on complexity:

  • IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification (ICLR2018 Workshop) [pdf]
    S. Leroux, P. Molchanov, P. Simoens, B. Dhoedt, T. Breuel, and J. Kautz
  • Dynamic recursive neural network (CPVR2019) [pdf]
    Guo, Q., Yu, Z., Wu, Y., Liang, D., Qin, H., and Yan, J.

Channel-based methods

Channel-based methods execute specific channels to reduce computational complexity.

  • Estimating or propagating gradients through stochastic neurons for conditional computation [pdf]
    Bengio Y, Léonard N, Courville A.

  • Runtime Neural Pruning (NIPS2017) [pdf]
    J. Lin, Y. Rao, J. Lu, and J. Zhou

  • Dynamic Channel Pruning: Feature Boosting and Suppression (Arxiv2018) [pdf] [tensorflow] [unoffical pytorch]
    X. Gao, Y. Zhao, Ł. Dudziak, R. Mullins, and C. Xu.

  • Channel Gating Neural Networks (NIPS2019) [pdf] [pytorch]
    W. Hua, Y. Zhou, C. M. De Sa, Z. Zhang, and G. E. Suh

  • You Look Twice: GaterNet for Dynamic Filter Selection in CNNs (CVPR2019) [pdf]
    Z. Chen, Y. Li, S. Bengio, and S. Si

  • Runtime Network Routing for Efficient Image Classification (TPAMI2019) [pdf]
    Y. Rao, J. Lu, J. Lin, and J. Zhou

  • Dynamic Neural Network Channel Execution for Efficient Training (BMVC2019) [pdf]
    S. E. Spasov and P. Lio

  • Learning Instance-wise Sparsity for Accelerating Deep Models (IJCAI2019) [pdf]
    Liu C, Wang Y, Han K, Xu C, Xu C.

  • Batch-Shaping for Learning Conditional Channel Gated Networks (ICLR2020) [pdf]
    BE Bejnordi, T Blankevoort, M Welling

  • Dynamic slimmable network (CVPR2021) [pdf] [pytorch]
    Li, Changlin, et al.

  • Dynamic Slimmable Denoising Network. (2021) [pdf] Jiang, Zutao, Changlin Li, Xiaojun Chang, Jihua Zhu, and Yi Yang

  • DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers (2021) [pdf] Li, C., Wang, G., Wang, B., Liang, X., Li, Z., & Chang, X.

  • Borrowing from yourself: Faster future video segmentation with partial channel update (2022) [pdf]

  • Multi-dimensional dynamic model compression for efficient image super-resolution (WACV2022) [pdf]

Spatial methods

Spatial methods exploit spatial redundancies, such as unimportant regions, to save computations

Spatial per-pixel

  • PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions (NIPS2016) [pdf] [matconvnet] [caffe]
    M. Figurnov, A. Ibraimova, D. P. Vetrov, and P. Kohli

  • Spatially Adaptive Computation Time for Residual Networks (CVPR2017) [pdf] [tensorflow]
    Figurnov M, Collins MD, Zhu Y, Zhang L, Huang J, Vetrov D, Salakhutdinov R.

  • Pixel-wise Attentional Gating for Parsimonious Pixel Labeling (WACV2019) [pdf] [matconvnet]
    S. Kong and C. Fowlkes

  • Boosting the Performance of CNN Accelerators with Dynamic Fine-Grained Channel Gating (MICRO2019) [pdf] Weizhe Hua, Yuan Zhou, Christopher De Sa, Zhiru Zhang, and G. Edward Suh

  • Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020) [pdf] [Pytorch]
    T. Verelst and T. Tuytelaars, “Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

  • Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation (ECCV2020) [pdf] [pytorch]
    Z. Xie, Z. Zhang, X. Zhu, G. Huang, and S. Lin

  • Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activation (ICLR2020) [pdf] [pytorch]
    Zhang Y, Zhao R, Hua W, Xu N, Suh GE, Zhang Z.

  • Dynamic Dual Gating Neural Networks (ICCV2021) [pdf]

  • Skip-Convolutions for Efficient Video Processing (CVPR2021) [pdf] [pytorch]
    [VID]

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR2022) [pdf]

Spatial per-block

  • SBNet: Sparse Blocks Network for Fast Inference (CVPR2018) [pdf] [tensorflow]
    M. Ren, A. Pokrovsky, B. Yang, and R. Urtasun

  • Uncertainty based model selection for fast semantic segmentation (MVA2019) [pdf]

  • SegBlocks: Block-Based Dynamic Resolution Networks for Real-Time Segmentation (ECCV2020 Workshop) [pdf] Thomas Verelst and Tinne Tuytelaars

  • Spatially Adaptive Feature Refinement for Efficient Inference [pdf]
    Y Han, G Huang, S Song, L Yang, Y Zhang, H Jiang

  • BlockCopy: High-Resolution Video Processing with Block-Sparse Feature Propagation and Online Policies (ICCV 2021) [pdf] Thomas Verelst and Tinne Tuytelaars
    [VID]

Spatial warping

  • Learning to Zoom: A Saliency-Based Sampling Layer for Neural Networks (ECCV2018) [[pdf]] [pytorch] Adria Recasens, Petr Kellnhofer, Simon Stent, Wojciech Matusik, Antonio Torralba

Glances and dynamic crops

Takes crops to further refine predictions

  • Action Recognition using Visual Attention (ICLR 2016 Workshop) [pdf] [theano]
    S. Sharma, R. Kiros, and R. Salakhutdinov

  • Recurrent Models of Visual Attention (NIPS2014) [pdf]
    V. Mnih, N. Heess, A. Graves, and koray kavukcuoglu

  • Dynamic Capacity Networks (ICML2016) [pdf] [tensorflow] [unofficial pytorch]
    A. Almahairi, N. Ballas, T. Cooijmans, Y. Zheng, H. Larochelle, and A. Courville

  • Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification [pdf]
    Y. Wang, K. Lv, R. Huang, S. Song, L. Yang, and G. Huang

  • Learning Where to Focus for Efficient Video Object Detection (ECCV2020) [pdf] [github]
    Z. Jiang et al.

  • Adaptive Focus for Efficient Video Recognition (2021) [pdf]
    Yulin Wang, Zhaoxi Chen, Haojun Jiang, Shiji Song, Yizeng Han, Gao Huang
    [VID]

  • Adafocus v2: End-to-end training of spatial dynamic networks for video recognition (2021) [pdf] [VID]

Other (dilation etc)

  • D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos [pdf] Christian Schmidt, Ali Athar, Sabarinath Mahadevan, Bastian Leibe
    [VID]

Adaptive resolution methods

Adaptive resolution methods adapt the processing resolution to the input image.

  • Resolution Adaptive Networks for Efficient Inference (CPVR2020) [pdf] [pytorch]
    L. Yang, Y. Han, X. Chen, S. Song, J. Dai, and G. Huang
  • Resolution Switchable Networks for Runtime Efficient Image Recognition (ECCV2020) [pdf] [pytorch]
    Y. Wang, F. Sun, D. Li, and A. Yao
  • Dynamic Resolution Network (2021) [pdf]
  • Multi-dimensional dynamic model compression for efficient image super-resolution (WACV2022) [pdf]

Transformers

  • Dynamically Pruning Segformer for Efficient Semantic Segmentation (Arxiv2021) [pdf]
    Haoli Bai, Hongda Mao, Dinesh Nair

  • Spatio-Temporal Gated Transformers for Efficient Video Processing (2021) [pdf]
    Yawei Li, Babak Ehteshami Bejnordi, Bert Moons, Tijmen Blankevoort, Amirhossein Habibian, Radu Timofte, Luc Van Gool
    [VID]

  • Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition (NIPS2021) [pdf]
    Yulin Wang, Rui Huang, Shiji Song, Zeyi Huang, Gao Huang

  • Multi-Exit Vision Transformer for Dynamic Inference (2021) [pdf]
    A Bakhtiarnia, Q Zhang, A Iosifidis

  • Dynamic Grained Encoder for Vision Transformers (NIPS2021) [pdf]
    Song, Lin, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, and Nanning Zhen

  • A-ViT: Adaptive Tokens for Efficient Vision Transformer (CVPR2022) [pdf]

Dynamic filters/weights

  • Dynamic filter networks (NIPS2016) [pdf]
    Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V.

  • Dynamic region-aware convolution (CVPR2021) [pdf]
    Chen, J., Wang, X., Guo, Z., Zhang, X., & Sun, J.

  • Decoupled Dynamic Filter Networks (CVPR2021) [pdf]

  • Involution: Inverting the inherence of convolution for visual recognition (CPVR2021) [pdf] [pytorch] [unofficial tf]

  • Adaptive Convolutions with Per-pixel Dynamic Filter Atom (ICCV2021) [pdf]

Quantization

  • Instance-Aware Dynamic Neural Network Quantization (CVPR2022) [[pdf]] (https://openaccess.thecvf.com/content/CVPR2022/html/Liu_Instance-Aware_Dynamic_Neural_Network_Quantization_CVPR_2022_paper.html)

Mixture of experts

  • HydraNets: Specialized Dynamic Architectures for Efficient Inference (CVPR2019) [pdf]
    Teja Mullapudi R, Mark WR, Shazeer N, Fatahalian K.
  • Outrageously large neural networks: The sparsely-gated mixture-of-experts layer (ICLR 2017) [pdf] [unofficial pytorch]
    Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J.

Other

Video

(if not listed above with [VID] tag)

  • Leaky Gated Cross-Attention for Weakly Supervised Multi-Modal Temporal Action Localization (WACV2022) [pdf] [VID]

  • ELIχR: Eliminating Computation Redundancy in CNN-Based Video Processing (RSDHA2021) [IEEE] [ VID]