Popular Computer Vision Model Benchmarks
Input Dimensions
- Batch Size = 8, Image = 3 x 224 x 224 (IF NOTHING SPECIFIED / CPU USED)
 
- Batch Size = 4, Image = 3 x 224 x 224
 
GPU USED --- Titan 1080Ti 12 GB
| Model | 
Framework | 
Forward Pass | 
Backward Pass | 
Total Time | 
Inference | 
| VGG16 | 
Pytorch 0.4.1 | 
0.0245 s | 
0.0606 s | 
0.0852 s | 
0.0234 s | 
 | 
Flux 0.6.8+ | 
0.0287 s | 
0.0760 s | 
0.1047 s | 
0.0288 s | 
| VGG16 BN | 
Pytorch 0.4.1 | 
0.0271 s | 
0.0672 s | 
0.0943 s | 
0.0273 s | 
 | 
Flux 0.6.8+ | 
0.0333 s | 
0.0818 s | 
0.1151 s | 
0.0327 s | 
| VGG19 | 
Pytorch 0.4.1 | 
0.0281 s | 
0.0741 s | 
0.1021 s | 
0.0280 s | 
 | 
Flux 0.6.8+ | 
0.0355 s | 
0.0923 s | 
0.1278 s | 
0.0356 s | 
| VGG19 BN | 
Pytorch 0.4.1 | 
0.0321 s | 
0.0812 s | 
0.1134 s | 
0.0325 s | 
 | 
Flux 0.6.8+ | 
0.0377 s | 
0.0965 s | 
0.1342 s | 
0.0371 s | 
| Resnet18 | 
Pytorch 0.4.1 | 
0.0064 s | 
0.0125 s | 
0.0190 s | 
0.0050 s | 
 | 
Flux 0.6.8+ | 
0.0079 s | 
0.0218 s | 
0.0297 s | 
0.0079 s | 
| Resnet34 | 
Pytorch 0.4.1 | 
0.0092 s | 
0.0216 s | 
0.0307 s | 
0.0092 s | 
 | 
Flux 0.6.8+ | 
0.0137 s | 
0.0313 s | 
0.0450 s | 
0.0151 s | 
| Resnet50 | 
Pytorch 0.4.1 | 
0.0155 s | 
0.0351 s | 
0.0506 s | 
0.0152 s | 
 | 
Flux 0.6.8+ | 
0.0205 s | 
0.1795 s | 
0.2000 s | 
- | 
| Resnet101 | 
Pytorch 0.4.1 | 
0.0297 s | 
0.0379 s | 
0.0676 s | 
0.0298 s | 
 | 
Flux 0.6.8+ | 
0.0215 s | 
0.0616 s | 
0.0831 s | 
0.0208 s | 
| Resnet152 | 
Pytorch 0.4.1 | 
0.0431 s | 
0.05337 s | 
0.0965 s | 
0.0429 s | 
 | 
Flux 0.6.8+ | 
0.0308 s | 
0.0807 s | 
0.1115 s | 
0.0298 s | 
CPU USED --- Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
| Model | 
Framework | 
Forward Pass | 
Backward Pass | 
Total Time | 
Inference | 
| VGG16 | 
Pytorch 0.4.1 | 
6.6024 s | 
9.4336 s | 
16.036 s | 
6.4216 s | 
 | 
Flux 0.6.8+ | 
10.458 s | 
10.245 s | 
20.703 s | 
10.111 s | 
| VGG16 BN | 
Pytorch 0.4.1 | 
7.0793 s | 
9.0536 s | 
16.132 s | 
6.7909 s | 
 | 
Flux 0.6.8+ | 
29.633 s | 
18.649 s | 
49.282 s | 
24.047 s | 
| VGG19 | 
Pytorch 0.4.1 | 
8.3075 s | 
10.899 s | 
19.207 s | 
8.0593 s | 
 | 
Flux 0.6.8+ | 
12.226 s | 
12.457 s | 
24.683 s | 
12.029 s | 
| VGG19 BN | 
Pytorch 0.4.1 | 
8.7794 s | 
12.739 s | 
21.519 s | 
8.4044 s | 
 | 
Flux 0.6.8+ | 
28.518 s | 
21.464 s | 
49.982 s | 
22.649 s | 
| <!-- | 
Resnet18 | 
Pytorch 0.4.1 | 
 | 
 | 
 | 
 | 
Flux 0.6.8+ | 
 | 
 | 
 | 
 | 
| Resnet34 | 
Pytorch 0.4.1 | 
 | 
 | 
 | 
 | 
 | 
Flux 0.6.8+ | 
 | 
 | 
 | 
 | 
| Resnet50 | 
Pytorch 0.4.1 | 
 | 
 | 
 | 
 | 
 | 
Flux 0.6.8+ | 
 | 
 | 
 | 
 | 
| Resnet101 | 
Pytorch 0.4.1 | 
 | 
 | 
 | 
 | 
 | 
Flux 0.6.8+ | 
 | 
 | 
 | 
 | 
| Resnet152 | 
Pytorch 0.4.1 | 
 | 
 | 
 | 
 | 
 | 
Flux 0.6.8+ | 
 | 
 | 
 | 
 | 
Individual Layer Benchmarks
Layer Descriptions
- Conv3x3/1 = Conv2d, 3x3 Kernel, 1x1 Padding, 1x1 Stride
 
- Conv5x5/1 = Conv2d, 5x5 Kernel, 2x2 Padding, 1x1 Stride
 
- Conv3x3/2 = Conv2d, 3x3 Kernel, 1x1 Padding, 2x2 Stride
 
- Conv5x5/2 = Conv2d, 5x5 Kernel, 2x2 Padding, 2x2 Stride
 
- Dense = 1024 => 512
 
- BatchNorm = BatchNorm2d
 
GPU USED --- Titan 1080Ti 12 GB
| Layer | 
Framework | 
Forward Pass | 
Backward Pass | 
Total Time | 
| Conv3x3/1 | 
Pytorch 0.4.1 | 
0.2312 ms | 
0.5359 ms | 
0.7736 ms | 
 | 
Flux 0.6.8+ | 
0.1984 ms | 
0.7640 ms | 
0.9624 ms | 
| Conv5x5/1 | 
Pytorch 0.4.1 | 
0.2667 ms | 
0.5345 ms | 
0.8299 ms | 
 | 
Flux 0.6.8+ | 
0.2065 ms | 
0.8075 ms | 
1.014 ms | 
| Conv3x3/2 | 
Pytorch 0.4.1 | 
0.1170 ms | 
0.2203 ms | 
0.3376 ms | 
 | 
Flux 0.6.8+ | 
0.0927 ms | 
0.5988 ms | 
0.6915 ms | 
| Conv5x5/2 | 
Pytorch 0.4.1 | 
0.1233 ms | 
0.2162 ms | 
0.3407 ms | 
 | 
Flux 0.6.8+ | 
0.0941 ms | 
0.6515 ms | 
0.7456 ms | 
| Dense | 
Pytorch 0.4.1 | 
0.0887 ms | 
0.1523 ms | 
0.2411 ms | 
 | 
Flux 0.6.8+ | 
0.0432 ms | 
0.2044 ms | 
0.2476 ms | 
| BatchNorm | 
Pytorch 0.4.1 | 
0.1096 ms | 
0.1999 ms | 
0.3095 ms | 
 | 
Flux 0.6.8+ | 
0.2211 ms | 
0.2849 ms | 
0.5060 ms | 
NOTE
To reproduce the benchmarks checkout Flux 0.6.8+ avik-pal/cudnn_batchnorm and CuArrays master.
Since the Batchnorm GPU is broken for Flux 0.6.8+ master so we cannot perform the benchmarks using that.