deep_learning_glossary icon indicating copy to clipboard operation
deep_learning_glossary copied to clipboard

Simple, opinionated explanations of various things encountered in Deep Learning

Deep Learning Glossary

Simple, opinionated explanations of various things encountered in Deep Learning / AI / ML.

Contributions welcome - there may be errors here!

Contests

ILSVRC = ImageNet Large Scale Visual Recognition Competition

The most prominent computer vision contest, using the largest data set of images (ImageNet). The progress in the classification task has brought CNNs to dominate the field of computer vision.

Year Model Top-5 Error Layers Paper
2012 AlexNet 17.0 % 8 http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
2013 ZFNet 17.0 % 8 http://arxiv.org/abs/1311.2901
2014 VGG-19 8.43% 19 http://arxiv.org/abs/1409.1556
2014 GoogLeNet / Inception 7.89% 22 http://arxiv.org/abs/1409.4842
2015 Inception v3
2015 ResNet 4.49% 152 http://arxiv.org/abs/1512.03385

Techniques

Stochastic Gradient Descent (SGD)

The original and simpliest back propigation optimization algorithm. Still used everywhere!

SGD with Momentum

A simple and often used improvement to SGD - follow past gradients with some weight.

Adagrad

Another optimizer

Adam Optimizer

Kingma & Ba, 2015

FTRL-proximal algorithm, Follow-the-regularized-leader

Google, 2013

Rectified Linear Unit (ReLU)

Rectified linear unit is a common activation function which was first proved useful in AlexNet. Recommend over sigmoid activiation.

relu(x) = max(x, 0)

Nair & Hinton, 2010

Parametric Rectified Linear Unit (PReLU)

http://arxiv.org/pdf/1502.01852v1

Leaky Rectified Linear Unit

Sometimes inputs to an ReLU can get pushed way negative, in which case the neuron can get permanently turned off (the exploding/vanishing gradients problem). The leaky ReLU is used to combat this problem by having a small slope on the negative side.

def leaky_relu(x): 
  if 0 <= x:
    return x
  else:
    return 0.01 * x

Batch Normialization (BN)

Normalizing the inputs to each activation function can dramatically speed up learning.

Ioffe & Szegedy, 2015

Dropout

Introduced in AlexNet? Randomly zero out 50% of inputs during the forward pass. Simple regularizer.

LSTM = Long Short Term Memory

A type of RNN that solves the exploding/vanishing gradient problem.

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

This paper is a great exploration of variations. Concludes that vanilla LSTM is best.

Originally invented by Hochreiter & Schmidhuber, 1997

Models

AlexNet

Winner of ILSVRC 2012. Made a huge jump in accuracy using CNN. Dropout, ReLUs.

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

VGG-16 / VGG-19 / OxfordNet

Close second place winner in ILSVRC 2014. Very simple CNN architecure using only 3x3 convolutions, max pooling, ReLUs, dropout

Neural Random-Access Machine (NRAM)

http://arxiv.org/pdf/1511.06392v1

Grid-LSTM

(Kalchbrenner et al., 2015)

Neural Turing Machine (NTM)

(Graves et al., 2014)

Deep Q Network (DQN)

(Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra Martin Riedmiller)

Software

Caffe

TensorFlow

Theano

Torch

CuDNN

MxNet

Data sets

CIFAR-10

60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

https://www.cs.toronto.edu/~kriz/cifar.html

ImageNet

MNIST

Handwritten digits. 28x28 images. 60,000 training images and 10,000 testing images

IAM Handwriting Database

http://www.iam.unibe.ch/fki/databases/iam-handwriting-database

Famously used in Graves's handwriting generation RNN: http://www.cs.toronto.edu/~graves/handwriting.html

http://yann.lecun.com/exdb/mnist/

TIMIT Speech corpus

(Garofolo et al., 1993)