Oleg Zabluda
Oleg Zabluda
Because it's Alexnet from the "One weird trick" paper: https://arxiv.org/abs/1404.5997. Classic AlexNet 2-column topology was caused by him using 2 GPUs with 3GB RAM each (maximum at the time). As...
2-columnn AlexNet Intel is benchmarking at the announcement (different from 1-col AlexNet "One weird trick" from Soumith's benchmark) has 1449 MFLOPs per image in the forward pass and 2x that...
@soumith>A full [forward + backward] on AlexNet on a Desktop 6-core Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz takes an average of 164ms EDIT: 268 ms. [...] I need a couple...
@moskewcz I also noticed that Intel's Caffe seems to report timings for conv layers per image and for fc per minibatch. I corrected the table above (I also realized Soumith's...
@moskewcz Stock Caffe timings sure are per minibatch (like Soumith's OpenBLAS timings). Intel's port timings do look like garbage (say 0.726ms for conv1), unless they are per image (except for...
@andravin> The benchmark used dual E5-2699-v3 CPUs, which have 18 cores at 2.3 GHz => 2x18x32FLOPs/cyclex2.3Ghz=2.65TFLOPs Actual AVX base clock is 1.9 Ghz (see quote below). 2 CPU \* 18...
@andravin: > Ah, I did not know about this feature of Xeon processors, thanks. So it is Xeon only? My i7-5775C seems to sustain AVX2 256-bit FMA instructions at regular...
@moskewcz: > i think there are some issues with your per-layer analysis in your second comment. firstly, i don't think we can trust the per-layer #s from the caffe log...
@andravin>Here are official Intel documents about avx and frequencies for Xeon E5 v3, does not mention other processors, which of course leaves us wondering: Thank you. These are good. I...
@soumith: > @moskewcz 3.77TF/s doesn't hold true if you switch to FFT or Winograd based convolutions. > http://arxiv.org/abs/1509.09308 This is pretty awesome, @andravin