This is a deep learning tutorial which is summarized to help someone who want to join to deep learning group


  • This is a deep learning tutorial!!! More state-of-the-art papers and methods will be updated.

Book List

Chinese Book

  • 《机器学习实战》--Peter Harrington

  • 《机器学习》--周志华

  • 《统计学习方法》--李航

  • 《神经网络与深度学习》--邱锡鹏.link

  • 《深度学习》--Ian GoodFellow, Yoshua Bengio et al. link

English Book

  • 《Deep Learning》--Ian GoodFellow, Yoshua Bengio et al

  • 《Machine Learning Yearning》-- Andrew Ng

  • 《Pattern Recognition and Machine Learning》--Christopher M. Bishop

    • book:

    • : enter code:cquc

    • codes


  • 《Reinforcement Learning: An Introduction》--Richard Sutton

    • book:


    • codes:


    • course materials:


Courses List

Paper List


Computer Vision

Image Revolution

[0] Graves, Alex. "Generating sequences with recurrent neural networks." arXiv preprint arXiv:1308.0850 (2013).(LSTM, very nice generating result, show the power of RNN)

[1] Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).(First Seq-to-Seq Paper)

[2] Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.(Outstanding Work) :star::star::star::star::star:

[3] Bahdanau, Dzmitry, KyungHyun Cho, and Yoshua Bengio. "Neural Machine Translation by Jointly Learning to Align and Translate." arXiv preprint arXiv:1409.0473 (2014).

[4] Vinyals, Oriol, and Quoc Le. "A neural conversational model." arXiv preprint arXiv:1506.05869 (2015).(Seq-to-Seq on Chatbot)

[5] Understanding LSTM Networks :star::star::star::star::star:

CNN(Convolutional Neural Networks)

[0] Dilated Convolutional Kernel - Fisher Yu, Vladlen Koltun:Multi-Scale Context Aggregation by Dilated Convolutions. ICLR(2016)

[1] Deformable Convolutional Kernel - Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei:Deformable Convolutional Networks. CoRR abs/1703.06211 (2017)

[2] Convolution Operations. link

[3] Convolution Analyzer. link

[4] What Do We Understand About Convolutional Networks? link

Lightly Convolution Neural Networks

[0] Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, Kurt Keutzer:SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). SqueezeNet

[1] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam:MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861 (2017). MobileNets

[2] Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381 (2018). MobileNets_V2

[3] François Chollet:Xception: Deep Learning with Depthwise Separable Convolutions. CVPR 2017: 1800-1807. Xception

[4] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun:ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CoRR abs/1707.01083 (2017). ShuffleNet

[5] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc V. Le: Learning Transferable Architectures for Scalable Image Recognition. CoRR abs/1707.07012 (2017). NasNet

[6] Robert J. Wang, Xiang Li, Shuang Ao, Charles X. Ling:Pelee: A Real-Time Object Detection System on Mobile Devices. CoRR abs/1804.06882 (2018). PeleeNet

Model Constraints

[0] Hinton, Geoffrey E., et al. "Improving neural networks by preventing co-adaptation of feature detectors." arXiv preprint arXiv:1207.0580 (2012). (Dropout)

[1] Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.

[2] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." arXiv preprint arXiv:1502.03167 (2015).(An outstanding Work in 2015)

[3] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016). (Update of Batch Normalization)

[4] Courbariaux, Matthieu, et al. "Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to+ 1 or−1." (New Model,Fast)

[5] Jaderberg, Max, et al. "Decoupled neural interfaces using synthetic gradients." arXiv preprint arXiv:1608.05343 (2016). (Innovation of Training Method,Amazing Work) :star::star::star::star::star:

[6] Chen, Tianqi, Ian Goodfellow, and Jonathon Shlens. "Net2net: Accelerating learning via knowledge transfer." arXiv preprint arXiv:1511.05641 (2015). (Modify previously trained network to reduce training epochs)

[7] Wei, Tao, et al. "Network Morphism." arXiv preprint arXiv:1603.01670 (2016). (Modify previously trained network to reduce training epochs)

[8] Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding." CoRR, abs/1510.00149 2 (2015). (ICLR best paper, new direction to make NN running fast,DeePhi Tech Startup) :star::star::star::star::star:

[9] Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size." arXiv preprint arXiv:1602.07360 (2016).(Also a new direction to optimize NN,DeePhi Tech Startup)


Optimization Methods

[0] Sebastian Ruder:An overview of gradient descent optimization algorithms. CoRR abs/1609.04747 (2016):star::star::star::star::star:

[1] Back Propagation Algorithm

[2] Andrychowicz, Marcin, et al. "Learning to learn by gradient descent by gradient descent." arXiv preprint arXiv:1606.04474 (2016).(Neural Optimizer,Amazing Work)

Optimization Functions

  • Momentum
  • Nesterov accelerated gradient
  • Adagrad
  • Adadelta
  • RMSprop
  • Adam
  • AdaMax
  • Nadam

:star::star::star::star::star:Adam is a better choice

Types of Activation

  • sigmoid
  • hard sigmoid
  • tanh
  • relu
  • lerelu
  • elu
  • selu
  • prelu
  • maxout
  • swish
  • softplus
  • softshrink
  • softsign
  • tanhshrink
  • softmin
  • softmax
  • logsoftmax
  • softmax2d
  • etc.

relu, lerelu, tanh, sigmoid is recommanded strongly!!!

Journals and Periardical

Machine Learning and Theories

  • NIPS
  • ICML
  • ICLR

 Computer Vision

  • CVPR
  • ICCV
  • ECCV

Neural Language Processing

  • ACL

Artifical Intelligence

  • AAAI

Public Accounts

  • 机器之心
  • 新智元

Deep Learning Framework(open source framework)

  • Tensorflow

    • Tensorflow Tutorial Summary

    • Learning codes:

    • tensorflow slim:

    • tensorflow modules:

    • tensorflow pre-train models

    • tensorflow model zoo

  • Caffe

  • Pytorch

    • Learning codes:
    • blog:
    • pytorch summary:
  • Keras

  • Mxnet

  • etc.

New Architecture

  • Convolution Neural Networks

  • Recurrent Neural Networks

  • Generative Adversarial Networks

  • Capsules(Dynamic Routing Between Capsules--by Hinton)



    official codes

 - DenseNet:Densely Connected Convolutional Networks. DenseNet

 - DiracNets: Training Very Deep Neural Networks Without Skip-Connections. DiracNet

  • Non-local Neural Networks. Non-Local Nets

  • Convolutional Neural Networks with Alternately Updated Clique. CliqueNet

Other Sources

Generative Adversarial Networks:(GAN):

  • GAN Paper

  • GAN Tricks

  • GAN Tutorial 2018CVPR

  • From GAN to WGAN

  • GAN Codes




  • GAN Performance Report

  • GAN video

  • 10 papers for GAN(strongly recommend)

    • Progressive Growing of GANs for Improved Quality, Stability, and Variation
    • Spectral Normalization for Generative Adversarial Networks
    • cGANs with Projection Discriminator
    • High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
    • Are GANs Created Equal? A Large-Scale Study
    • Improved Training of Wasserstein GANs
    • StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
    • Privacy-preserving generative deep neural networks support clinical data sharing
    • Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks
    • Gradient descent GAN optimization is locally stable
  • Something interesting about GAN

    (1) cycle-gan

    (2) progressive-grow gan

Deep Architecture Genealogy

  • deep_architecture_genealogy:
  • coggle link:

Python Resources


Computer Vision


Geometry and SLAM

Object Detection

Face Datasets

Dehazing Datasets

  • I-HAZE
  • O-HAZE

Deraining Datasets

  • Rain100H
  • Rain100L
  • Rain12
  • Rain800
  • Rain14000
  • RainHML


