Andrew Lavin

Results 61 comments of Andrew Lavin

@soumith Oh I see now you are using your own prototxt file, not the one that was provided by Intel. Obviously there is something wrong that is causing your prototxt...

Actually I get reasonable numbers using your alexnet.prototxt too. So I am not sure what is wrong with your setup.

@ozabluda Here are official Intel documents about avx and frequencies for Xeon E5 v3, does not mention other processors, which of course leaves us wondering: http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf Still haven't found...

Thanks, @ozabluda I'm looking forward to the first truly efficient implementations of the fast Winograd convnet algorithms. The first draft of the paper was just a teaser. ;-)

@ozabluda Thanks, one thing I think you are missing is that transformed data can be re-used for convolution with every filter. So the data transform FLOPs can be amortized over...

Thanks, @rsdubtso We found that whitepaper but it only explicitly mentions Xeon E5 v3 processors. Are other processors (eg i7) affected by AVX2 frequencies, if so where can we find...

@ozabluda F(2x2,3x3) has a maximum speedup of (2x2x3x3)/(4x4) = 2.25. In general the max speedup for F(mxn, rxs) is (m n r s) / ((m+r-1)(n+s-1))

Because multiplication, addition, and multiply accumulate all have the same throughput, I count them all equally. That not only makes the analysis simpler, but gives you a more accurate accounting...

Perhaps the gradient w.r.t. inputs should be left out of the L1 benchmark, because it would not actually be computed in the first layer of a real network. Theano results...