Popular Computer Vision Model Benchmarks
Input Dimensions
- Batch Size = 8, Image = 3 x 224 x 224 (IF NOTHING SPECIFIED / CPU USED)
- Batch Size = 4, Image = 3 x 224 x 224
GPU USED --- Titan 1080Ti 12 GB
Model |
Framework |
Forward Pass |
Backward Pass |
Total Time |
Inference |
VGG16 |
Pytorch 0.4.1 |
0.0245 s |
0.0606 s |
0.0852 s |
0.0234 s |
|
Flux 0.6.8+ |
0.0287 s |
0.0760 s |
0.1047 s |
0.0288 s |
VGG16 BN |
Pytorch 0.4.1 |
0.0271 s |
0.0672 s |
0.0943 s |
0.0273 s |
|
Flux 0.6.8+ |
0.0333 s |
0.0818 s |
0.1151 s |
0.0327 s |
VGG19 |
Pytorch 0.4.1 |
0.0281 s |
0.0741 s |
0.1021 s |
0.0280 s |
|
Flux 0.6.8+ |
0.0355 s |
0.0923 s |
0.1278 s |
0.0356 s |
VGG19 BN |
Pytorch 0.4.1 |
0.0321 s |
0.0812 s |
0.1134 s |
0.0325 s |
|
Flux 0.6.8+ |
0.0377 s |
0.0965 s |
0.1342 s |
0.0371 s |
Resnet18 |
Pytorch 0.4.1 |
0.0064 s |
0.0125 s |
0.0190 s |
0.0050 s |
|
Flux 0.6.8+ |
0.0079 s |
0.0218 s |
0.0297 s |
0.0079 s |
Resnet34 |
Pytorch 0.4.1 |
0.0092 s |
0.0216 s |
0.0307 s |
0.0092 s |
|
Flux 0.6.8+ |
0.0137 s |
0.0313 s |
0.0450 s |
0.0151 s |
Resnet50 |
Pytorch 0.4.1 |
0.0155 s |
0.0351 s |
0.0506 s |
0.0152 s |
|
Flux 0.6.8+ |
0.0205 s |
0.1795 s |
0.2000 s |
- |
Resnet101 |
Pytorch 0.4.1 |
0.0297 s |
0.0379 s |
0.0676 s |
0.0298 s |
|
Flux 0.6.8+ |
0.0215 s |
0.0616 s |
0.0831 s |
0.0208 s |
Resnet152 |
Pytorch 0.4.1 |
0.0431 s |
0.05337 s |
0.0965 s |
0.0429 s |
|
Flux 0.6.8+ |
0.0308 s |
0.0807 s |
0.1115 s |
0.0298 s |
CPU USED --- Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Model |
Framework |
Forward Pass |
Backward Pass |
Total Time |
Inference |
VGG16 |
Pytorch 0.4.1 |
6.6024 s |
9.4336 s |
16.036 s |
6.4216 s |
|
Flux 0.6.8+ |
10.458 s |
10.245 s |
20.703 s |
10.111 s |
VGG16 BN |
Pytorch 0.4.1 |
7.0793 s |
9.0536 s |
16.132 s |
6.7909 s |
|
Flux 0.6.8+ |
29.633 s |
18.649 s |
49.282 s |
24.047 s |
VGG19 |
Pytorch 0.4.1 |
8.3075 s |
10.899 s |
19.207 s |
8.0593 s |
|
Flux 0.6.8+ |
12.226 s |
12.457 s |
24.683 s |
12.029 s |
VGG19 BN |
Pytorch 0.4.1 |
8.7794 s |
12.739 s |
21.519 s |
8.4044 s |
|
Flux 0.6.8+ |
28.518 s |
21.464 s |
49.982 s |
22.649 s |
<!-- |
Resnet18 |
Pytorch 0.4.1 |
|
|
|
|
Flux 0.6.8+ |
|
|
|
|
Resnet34 |
Pytorch 0.4.1 |
|
|
|
|
|
Flux 0.6.8+ |
|
|
|
|
Resnet50 |
Pytorch 0.4.1 |
|
|
|
|
|
Flux 0.6.8+ |
|
|
|
|
Resnet101 |
Pytorch 0.4.1 |
|
|
|
|
|
Flux 0.6.8+ |
|
|
|
|
Resnet152 |
Pytorch 0.4.1 |
|
|
|
|
|
Flux 0.6.8+ |
|
|
|
|
Individual Layer Benchmarks
Layer Descriptions
- Conv3x3/1 = Conv2d, 3x3 Kernel, 1x1 Padding, 1x1 Stride
- Conv5x5/1 = Conv2d, 5x5 Kernel, 2x2 Padding, 1x1 Stride
- Conv3x3/2 = Conv2d, 3x3 Kernel, 1x1 Padding, 2x2 Stride
- Conv5x5/2 = Conv2d, 5x5 Kernel, 2x2 Padding, 2x2 Stride
- Dense = 1024 => 512
- BatchNorm = BatchNorm2d
GPU USED --- Titan 1080Ti 12 GB
Layer |
Framework |
Forward Pass |
Backward Pass |
Total Time |
Conv3x3/1 |
Pytorch 0.4.1 |
0.2312 ms |
0.5359 ms |
0.7736 ms |
|
Flux 0.6.8+ |
0.1984 ms |
0.7640 ms |
0.9624 ms |
Conv5x5/1 |
Pytorch 0.4.1 |
0.2667 ms |
0.5345 ms |
0.8299 ms |
|
Flux 0.6.8+ |
0.2065 ms |
0.8075 ms |
1.014 ms |
Conv3x3/2 |
Pytorch 0.4.1 |
0.1170 ms |
0.2203 ms |
0.3376 ms |
|
Flux 0.6.8+ |
0.0927 ms |
0.5988 ms |
0.6915 ms |
Conv5x5/2 |
Pytorch 0.4.1 |
0.1233 ms |
0.2162 ms |
0.3407 ms |
|
Flux 0.6.8+ |
0.0941 ms |
0.6515 ms |
0.7456 ms |
Dense |
Pytorch 0.4.1 |
0.0887 ms |
0.1523 ms |
0.2411 ms |
|
Flux 0.6.8+ |
0.0432 ms |
0.2044 ms |
0.2476 ms |
BatchNorm |
Pytorch 0.4.1 |
0.1096 ms |
0.1999 ms |
0.3095 ms |
|
Flux 0.6.8+ |
0.2211 ms |
0.2849 ms |
0.5060 ms |
NOTE
To reproduce the benchmarks checkout Flux 0.6.8+
avik-pal/cudnn_batchnorm and CuArrays
master.
Since the Batchnorm GPU is broken for Flux 0.6.8+ master so we cannot perform the benchmarks using that.