ResNet-MultipleFramework icon indicating copy to clipboard operation
ResNet-MultipleFramework copied to clipboard

ResNet benchmark by Keras(TensorFlow), Keras(MXNet), Chainer, PyTorch using Google Colab

ResNet-MultipleFramework

ResNet benchmark by Keras(TensorFlow), Keras(MXNet), Chainer, PyTorch using Google Colab

Summary

Framework N # Layers MinTestError s / epoch
Keras(TF) 3 20 0.0965 51.817
Keras(MXNet) 3 20 0.0963 50.207
Chainer 3 20 0.0995 35.360
PyTorch 3 20 0.0986 26.602
Keras(TF) 5 32 0.0863 75.746
Keras(MXNet) 5 32 0.0943 69.260
Chainer 5 32 0.0916 56.854
PyTorch 5 32 0.0893 40.670
Keras(TF) 7 44 0.0864 96.946
Keras(MXNet) 7 44 0.0863 86.921
Chainer 7 44 0.0892 80.935
PyTorch 7 44 0.0894 55.465
Keras(TF) 9 56 0.0816 119.361
Keras(MXNet) 9 56 0.0848 111.772
Chainer 9 56 0.0882 100.730
PyTorch 9 56 0.0895 70.834
  • N denotes the number of shortcuts of each stage described in ResNet paper.
  • For the number of layers, we used the formula 6N + 2 written in the paper. Since Conv2D stride 2 is used for subsampling instead of pooling, it is actually +2 this value.
  • Time per epoch excluded the first epoch and took the average of the remaining 99 epochs.

TPU in Colaboratory(updated at 9/30)

I also tried same expretiment using the free TPU of Google Colaboratory.

Framework N # Layers MinTestError s / epoch
TF-Keras(TPU) 3 20 0.154 19.666
TF-Keras(TPU) 5 32 0.153 19.818
TF-Keras(TPU) 7 44 0.167 19.969
TF-Keras(TPU) 9 56 0.133 19.932

(TensorFlow:1.11.0-rc2、Keras:2.1.6)
Batch-size is changed to 1024

TPU is extremely fast!!! TPU is at least x6 faster than GPU version of Keras, and x3.5 faster than PyTorch which was the fastest among GPU frameworks.

But sadly, LearningRateScheduler does not work on the TPU due to a bug in TensorFlow. Therefore, accuracy is a little bit worse than the GPU.

See Details(Japanese)

  • https://blog.shikoan.com/resnet-multiple-framework/
  • https://blog.shikoan.com/colab-tpu-resnet-benchmarks/

Settings

Parameters are based on the paper, but differences are

  • Train on 100 epochs(The original iterates on 64k = 182 epochs).
  • Not use verification datasets. Use 50k images for training.
  • Initial learning rate is 0.01. Since it diverged when set to the original value 0.1. The learning rate scheduler remains to be used.
  • Weight decay of regularization parameters was changed from 0.0001 to 0.0005 (as it tended to overfits overall).
  • Run only once for each condition.

Environment

Library Version
TensorFlow 1.10.1
Keras 2.1.6
mxnet-cu80 1.2.1
Keras-mxnet 2.2.2
Chainer 4.4.0
cupy-cuda80 4.4.0
PyTorch 0.4.1
Torchvision 0.2.1
  • Experiment on Google Colab(using GPU)

Reference

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016 https://arxiv.org/abs/1512.03385