Andrew Lavin comments

Results 61 comments of


                                            Andrew Lavin

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@soumith Oh I see now you are using your own prototxt file, not the one that was provided by Intel. Obviously there is something wrong that is causing your prototxt...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

Actually I get reasonable numbers using your alexnet.prototxt too. So I am not sure what is wrong with your setup.

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@ozabluda Here are official Intel documents about avx and frequencies for Xeon E5 v3, does not mention other processors, which of course leaves us wondering: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf Still haven't found...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

Thanks, @ozabluda I'm looking forward to the first truly efficient implementations of the fast Winograd convnet algorithms. The first draft of the paper was just a teaser. ;-)

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@ozabluda Thanks, one thing I think you are missing is that transformed data can be re-used for convolution with every filter. So the data transform FLOPs can be amortized over...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

Thanks, @rsdubtso We found that whitepaper but it only explicitly mentions Xeon E5 v3 processors. Are other processors (eg i7) affected by AVX2 frequencies, if so where can we find...

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@scott-gray strikes again! Well done.

[October 2015] Intel are CPU magicians. But there's no one weird trick....

@ozabluda F(2x2,3x3) has a maximum speedup of (2x2x3x3)/(4x4) = 2.25. In general the max speedup for F(mxn, rxs) is (m n r s) / ((m+r-1)(n+s-1))

[October 2015] Intel are CPU magicians. But there's no one weird trick....

Because multiplication, addition, and multiply accumulate all have the same throughput, I count them all equally. That not only makes the analysis simpler, but gives you a more accurate accounting...

[August 2014] Discussion of results

Perhaps the gradient w.r.t. inputs should be left out of the L1 benchmark, because it would not actually be computed in the first layer of a real network. Theano results...