Andrew Lavin
Andrew Lavin
@jbalma yes I find strong scaling the training problem to be very interesting, and of course that is an active area of research. I hope you find a good forum...
Hi @nomi-wei , just a clarification: our fast convnet algorithms use [Winograd's _convolution_ algorithms](https://www.encyclopediaofmath.org/index.php/Winograd_small_convolution_algorithm). But the same Shmuel Winograd did co-author the [Coppersmith-Winograd fast matrix multiplication algorithm](https://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm), so the confusion...
"With these optimizations time to train AlexNet\* network on full ILSVRC-2012 dataset to 80% top5 accuracy reduces from 58 days to about 5 days." The benchmark used dual E5-2699-v3 CPUs,...
So anyway the numbers Intel reported sound plausible, but your numbers don't. :-)
.. and having looked a bit at Caffe's CPU implementation, im2col is single-threaded, and will be a pretty nasty bottleneck in a 36-core system.
@moskewcz your numbers sound plausible to me.. and so Intel's post really points to what a disaster out of the box Caffe performance must be on many-core CPUs.
There is no such thing as an AVX2 clock.
@ozabluda Ah, I did not know about this feature of Xeon processors, thanks. So it is Xeon only? soumith's Core(TM) i7-5930K will not have this? My i7-5775C seems to sustain...
I tracked down AVX base frequency specs for haswell e5 processors here: https://www.microway.com/knowledge-center-articles/detailed-specifications-intel-xeon-e5-2600v3-haswell-ep-processors/ Would be nice to find an official Intel source. I suspect this is only a feature of...
@soumith What command line did you use? README.txt says: ``` For timing #> ./build/tools/caffe time \ -iterations \ --model=models/intel_alexnet/train_val.prototxt ``` When I run that on my 4-core i7-5775C I get:...