benchmarks Very slow in running AlexNet with Cifar10 dataset

Hi, authors , the speed I achieved on AlexNet with Cifar10 dataset is only ~7000 images/sec using a TITAN X Pascal GPU. May I know what is the speed you have achieved, and is there any setting to achieve better performance? The command I used is: python tf_cnn_benchmarks.py --learning_rate 0.01 --num_gpus 1 --model alexnet --batch_size 1024 --data_name cifar10 --num_batches 100 --data_dir ~/data/tensorflow/cifar-10-batches-py

The tested version of TensorFlow is 1.2.1. Thanks!

Nov 28 '17 03:11 shyhuai

If you want a CIFAR10 example I would use one of these two as they are easier to follow.

I do see an interesting oddity, only one CPU is at 100% during the pre-processing, but I doubt that impacts throughput. It is possible something might be wrong that has not been noticed due to work on imagenet. We should likely add a test. Currently all of the tests for the script are for imagenet. Assigning to Reed to get someone to check out the CIFAR-10 code path and performance.

On my GTX-1080 I was at ~4K images/sec with real data using CUDA 9 + cuDNN 7 and ~TF 1.4.

I have no idea if the models were the same but using the model code written by @shyhuai team I think I might have been closer to 21K with a 128 batch-size no distortions GTX-1080 but I was recording step time and my memory is bad. I think that model was suspect but none-the-less.

Nov 28 '17 04:11 tfboyd

Thanks for your prompt reply. I am one of the authors of HKBU benchmark team. I am trying to achieve the best performance that TensorFlow should have in both ImageNet and Cifar10, so I move to this repository.

We have tested TF1.2 with your revised PR, and it did achieve ~22K images/sec, but the speed is still much slower than other frameworks like CNTK (~36K images/sec). That model is slightly different with the one in this repository (tf_cnn_benchmarks), but it is not the reason why tf_cnn_benchmarks is so slow in AlexNet-Cifar10.

Since TensorFlow has improved its performance so much in recent updates, and benchmarks have shown good performances in ImageNet, it might should have a similar result on Cifar10. Looking forward to your tests on Cifar10.

Thanks!

Nov 28 '17 06:11 shyhuai

To be clear, I do not plan to run CIFAR-10 benchmarks. Someone might look to make sure tf.data is not acting oddly in this script or overall for small datasets.

Nov 28 '17 16:11 tfboyd

On my GTX 1080, with the same command, I get 5100.76 images/sec. Using synthetic data instead, I get 1880.12 images/sec. Something is very wrong considering we get worse performance on synthetic data, and both numbers are too low.

@bignamehyp, want to take a look? Alternatively I can take a look in a few days.

Nov 28 '17 20:11 reedwm

I was incorrect in stating we got worse performance with synthetic data. It turns out that omitting --data_dir=... always uses synthetic ImageNet data, even when --data_name=cifar10. I will submit a fix to use synthetic cifar10 data when --data_dir is omitted and --data_name=cifar10.

The dlbench implementation of AlexNet here is very different from the tf_cnn_benchmarks implementation of AlexNet here, so I don't think they can be meaningfully compared in terms of performance. For example, tf_cnn_benchmarks uses expensive LRN operations, while dlbench does not. I looked at the original AlexNet paper and it did not provide a cifar10 implementation, so I am not sure where these cifar10 AlexNet models originate from. I'll ask the author of the tf_cnn_benchmarks cifar10 Alexnet model when I get the chance.

Note the Resnet paper does provide cifar10 implementations so perhaps that would be a better comparison.

Dec 07 '17 23:12 reedwm

Thanks @reedwm , would provide some numbers of performance in running AlexNet with correct sythetic/real cifar10 dataset?

Dec 08 '17 04:12 shyhuai