Soumith Chintala
Soumith Chintala
propagate_down is indeed true! [This is the benchmark code, and the place where it is set to true.](https://github.com/BVLC/caffe/blob/b1c4f121b4f01b538eef6997ba3af6c9a71afd31/tools/net_speed_benchmark.cpp#L90)
yes, it does that here: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/conv_layer.cu#L79
if we want a double confirmation, we can ask @rbgirshick who is watching this repo. Ross, what do you say?
> Isn't SpatialConvolutionMM supposed to be using the same tricks anyhow? Yes, SpatialConvolutionMM is borrowed from Caffe (thanks guys), but we are using the cublas v1 interface (whereas Caffe uses...
@f0k that does seem to be a good point! the utility I use is Caffe's own, and looking through (https://github.com/BVLC/caffe/blob/master/include/caffe/layer.hpp), I don't see an explicit synchronize either. I'll have to...
> but maybe the other benchmarks should do the same to ensure only the GPU time is measured, or the caffe benchmark should time "from the outside" just like the...
I benchmarked theano, torch, ccn2 and caffe with cuda 6.5. Also Alex updated ccn2 to add some more performance improvements. Also, fixed theano benchmark to average over 10 iterations (like...
right, fixed.
Thanks to @nouiz who fixed an issue with theano fft gradInput, now Theano FFT implementation is 1.5x faster than Caffe (which is 2nd place). Now is the time to also...
@stencilman it depends on how transparent you want it to be (multigpu can already be done in torch if you are careful with a few things), but this is not...