convnet-benchmarks icon indicating copy to clipboard operation
convnet-benchmarks copied to clipboard

benchmark MXNet and Chainer. Compare with TensorFlow and others.

Open soumith opened this issue 8 years ago • 17 comments

[reserved for review]

soumith avatar Nov 12 '15 05:11 soumith

<3

liuliu avatar Nov 12 '15 05:11 liuliu

An unofficial preliminary benchmark result is presented in https://github.com/dmlc/mxnet/issues/378#issuecomment-156730363. It is very surprising that MXNet is much faster than Caffe.

futurely avatar Nov 14 '15 18:11 futurely

This was due to incorrect timing on async API... there is no magic to run much faster than others using CuDNN..

tqchen avatar Nov 14 '15 18:11 tqchen

@futurely Also I notice you are using simple factory instead of correct inception factory. I am working on Conv/LSTM timing. Sorry for delay because I am traveling these days.

antinucleon avatar Nov 14 '15 19:11 antinucleon

The simple factory's results are not presented there. To make sure models for both libraries are exactly the same, only the GoogLeNet and VGG 16 models from the Caffe model zoo and their conversions to the MXNet format are used. Expect your aync API timing.

futurely avatar Nov 14 '15 19:11 futurely

Both MXNet and Chainer scripts are ready, thanks to Kentaa Oono and @antinucleon . As some of you might know, ICLR deadline is on Thursday, a bit too busy with that, will benchmark over the coming weekend.

soumith avatar Nov 16 '15 22:11 soumith

Any luck with results?

Strateus avatar Nov 23 '15 17:11 Strateus

+1

adamist521 avatar Dec 10 '15 06:12 adamist521

+1

saikrishb avatar Dec 22 '15 02:12 saikrishb

+1. How is it going by now?

wangg12 avatar Dec 22 '15 05:12 wangg12

Guys, rather than hassling Soumith, who does have a full-time job and stuff to do ;-) perhaps you might consider creating a pull request, with a script, so Soumith simply has to do git pull, and run your script :-) You can see that this is what Fabian did https://github.com/soumith/convnet-benchmarks/pull/49 , and myself https://github.com/soumith/convnet-benchmarks/pull/47 , for our own libraries, for example.

hughperkins avatar Dec 22 '15 06:12 hughperkins

To be fair to both Chainer and MXNet folks, they gave me scripts to benchmark. I put it off because of NIPS / ICLR, and their libs have changed APIs, so I am stuck fixing the scripts for chainer. As always, working on it, at my own pace.

soumith avatar Dec 22 '15 15:12 soumith

Just finished Chainer. Working on MXNet ...

soumith avatar Dec 22 '15 20:12 soumith

I committed MXNet AlexNet + Googlenet scripts that @antinucleon had given me. I wanted to get some experience with MXNet before I benchmarked it, because it can use multiple threads etc. Hence the delay (didn't find time to read the docs, get familiar etc.). If anyone who wants to see the MXNet benchmarks finishes the vgg and fixes the error in the googlenet script, I can run them on the Titan-X cards and report the numbers. Chainer logs btw are all checked in via: https://github.com/soumith/convnet-benchmarks/commit/c4dfa528cd7f2abd2e9abd91b294f91d01146c42

soumith avatar Jan 05 '16 08:01 soumith

It looks like there's a bug in the benchmark of Chainer.

They computed the averaged time to be total / niter-1 instead of total / (niter-1).

image


Another thing that I noticed is that the script uses cuda.Event() to measure the time for the backward pass while using the standard Python time() to measure the time for the forward pass. Does cuda.get_elapsed_time(cudaStartEvent, cudaEndEvent) measure time computational time in the backward pass before CUDA kernel launch? I'm asking because Chainer apparently does a lot of stuffs in Python (potentially negligible) before passing to libcudnn for each call of forward and backward.

zer0n avatar Mar 10 '16 01:03 zer0n

+1 want to know which one is faster!

zhonghh avatar Oct 04 '16 18:10 zhonghh

+1 I'm looking for it.

ezineo avatar Feb 21 '17 10:02 ezineo