convnet-benchmarks
convnet-benchmarks copied to clipboard
benchmark MXNet and Chainer. Compare with TensorFlow and others.
[reserved for review]
<3
An unofficial preliminary benchmark result is presented in https://github.com/dmlc/mxnet/issues/378#issuecomment-156730363. It is very surprising that MXNet is much faster than Caffe.
This was due to incorrect timing on async API... there is no magic to run much faster than others using CuDNN..
@futurely Also I notice you are using simple factory instead of correct inception factory. I am working on Conv/LSTM timing. Sorry for delay because I am traveling these days.
The simple factory's results are not presented there. To make sure models for both libraries are exactly the same, only the GoogLeNet and VGG 16 models from the Caffe model zoo and their conversions to the MXNet format are used. Expect your aync API timing.
Both MXNet and Chainer scripts are ready, thanks to Kentaa Oono and @antinucleon . As some of you might know, ICLR deadline is on Thursday, a bit too busy with that, will benchmark over the coming weekend.
Any luck with results?
+1
+1
+1. How is it going by now?
Guys, rather than hassling Soumith, who does have a full-time job and stuff to do ;-) perhaps you might consider creating a pull request, with a script, so Soumith simply has to do git pull
, and run your script :-) You can see that this is what Fabian did https://github.com/soumith/convnet-benchmarks/pull/49 , and myself https://github.com/soumith/convnet-benchmarks/pull/47 , for our own libraries, for example.
To be fair to both Chainer and MXNet folks, they gave me scripts to benchmark. I put it off because of NIPS / ICLR, and their libs have changed APIs, so I am stuck fixing the scripts for chainer. As always, working on it, at my own pace.
Just finished Chainer. Working on MXNet ...
I committed MXNet AlexNet + Googlenet scripts that @antinucleon had given me. I wanted to get some experience with MXNet before I benchmarked it, because it can use multiple threads etc. Hence the delay (didn't find time to read the docs, get familiar etc.). If anyone who wants to see the MXNet benchmarks finishes the vgg and fixes the error in the googlenet script, I can run them on the Titan-X cards and report the numbers. Chainer logs btw are all checked in via: https://github.com/soumith/convnet-benchmarks/commit/c4dfa528cd7f2abd2e9abd91b294f91d01146c42
It looks like there's a bug in the benchmark of Chainer.
They computed the averaged time to be total / niter-1
instead of total / (niter-1)
.
Another thing that I noticed is that the script uses cuda.Event()
to measure the time for the backward pass while using the standard Python time()
to measure the time for the forward pass. Does cuda.get_elapsed_time(cudaStartEvent, cudaEndEvent)
measure time computational time in the backward pass before CUDA kernel launch? I'm asking because Chainer apparently does a lot of stuffs in Python (potentially negligible) before passing to libcudnn for each call of forward and backward.
+1 want to know which one is faster!
+1 I'm looking for it.