Soumith Chintala
Soumith Chintala
@moskewcz 3.77TF/s doesn't hold true if you switch to FFT or Winograd based convolutions. References: https://en.wikipedia.org/wiki/Convolution_theorem http://arxiv.org/abs/1509.09308
@andravin @moskewcz thanks. I'm going to investigate a bit on why the numbers are much more fluffier on my machine. For a start, I'll probably start an end-to-end training and...
@moskewcz I've already verified that it's running on CPU and using intel code-paths, simply by collecting samples from the stack and looking at hotspots.
caffe is getting no access to the GPUs, I disabled it at the driver level. I just fixed the protobuf to force itself to do the backward phase (it was...
@andravin thanks for the log on your side. I suppose doing pure-benchmarking instead of having that lmdb data layer before might be having side-effects on the intel caffe. I'll follow-up...
Ok, so today I finally finished building my caffe lmdb for imagenet, and I ran the intel benchmarks with the lmdb data layer etc. etc. (just like how they want...
@scott-gray that sounds super exciting. Cant wait to bench it.
@ozabluda just note that IntelCaffe uses "minimum time over all runs" for the per-layer numbers, whereas regular Caffe uses "average time over all runs". That's one reason why I didn't...
@rsdubtso your suggested flags didn't make much difference -- IntelCaffe went from 3052 ms to 3000 ms
@rsdubtso Taking the minimum timing of each layer rather than the average is a bit misleading and is not a standard in benchmarking. I think you should consider changing that,...