mxnet Dummy-data based benchmarking script for Gluon RNN-API

Dummy-data based benchmarking script for Gluon RNN-API

Open juliusshufan opened this issue 5 years ago • 23 comments

Description

To easily evaluate the overall status on the performance diffs between the fused and unfused/stacked achieved on Gluon RNN API, this script can measure the dummy-data based benchmarking on LSTM and GRU RNN cell, the input-shape can either be explictly specified, or a predefined input-shape list. (The selected input shaped referring to the Deepbench).

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

[ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
[x] Changes are complete (i.e. I finished coding on this PR)
[x] All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
[ ] Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
[x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Add a standalone python file benchmark_gluon_rnn.py under benchmark/python/gluon

Comments

This script supports both gpu and cpu.

Oct 21 '18 09:10 juliusshufan

@sandeep-krishnamurthy Will you review this PR? Thanks.

Oct 22 '18 13:10 juliusshufan

@mxnet-label-bot [pr-awaiting-review]

Oct 22 '18 17:10 Roshrini

@sandeep-krishnamurthy Will you review this script? Thanks!

Oct 24 '18 10:10 juliusshufan

@sandeep-krishnamurthy Sorry for the late response, I have revised the PR per your comments. Could you please take a look? Thanks.

Oct 28 '18 14:10 juliusshufan

@sandeep-krishnamurthy It seems that your comments has been addressed. Could you please take a look?

Oct 29 '18 22:10 ankkhedia

@sandeep-krishnamurthy Thanks for your time on code review, may I know if you get a chance to review my further modification? Thanks.

Oct 31 '18 14:10 juliusshufan

@juliusshufan - Thanks for addressing the comments.

@szha - I am not sure if this will be useful for users. Can you please suggest/review? Thanks.

Oct 31 '18 18:10 sandeep-krishnamurthy

@szha May I know if you get a chance to review this PR, thanks.

Nov 04 '18 12:11 juliusshufan

Ping @juliusshufan ! Could you please address the review comments? We are looking forward to merging your PR.

Nov 13 '18 21:11 kalyc

@kalyc Thanks for reminding, I am working on addressing the review comments and will keep this PR posted.

Nov 14 '18 14:11 juliusshufan

@mxnet-label-bot update [pr-work-in-progress]

Nov 20 '18 22:11 stu1130

@juliusshufan - Can you please address review comments and look at failed CI?

Dec 29 '18 19:12 sandeep-krishnamurthy

@juliusshufan can you address the comments and CI test failures

Jan 11 '19 22:01 anirudhacharya

@nswamy @sandeep-krishnamurthy @anirudh2290 - Please consider closing this PR since there is no follow up from the author since November.

Mar 19 '19 23:03 pinaraws

@mxnet-label-bot add[pr-awaiting-response]

Mar 19 '19 23:03 pinaraws

@pinaraws sorry for long time no response, could you please keep it open for a while, I'll follow it up asap

Mar 24 '19 01:03 juliusshufan

@juliusshufan Were you able to complete this PR ?

Apr 09 '19 00:04 piyushghai

@juliusshufan Were you able to complete this PR ?

May 20 '19 16:05 pinaraws

@pengxin99 will keep on following this PR completion, thanks.

Jun 14 '19 02:06 juliusshufan

@roywei thanks for you review, and I test our script on 8180-1s with 28 cores, below is the results:

inference benchmark results：

INFO:root:lstm inference benchmark.
INFO:root:For BS = 64, Layers = 1, Shape=[64, 15, 500, 500], SPS=20993.399 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 20, 500, 500], SPS=17156.602 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 500, 500], SPS=14634.234 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 30, 500, 500], SPS=13103.086 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 35, 500, 500], SPS=11681.630 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 40, 500, 500], SPS=10475.330 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 45, 500, 500], SPS=9504.317 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 50, 500, 500], SPS=8842.211 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 512, 512], SPS=6095.115 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 512, 512], SPS=8296.429 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 512, 512], SPS=10112.136 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 512, 512], SPS=12306.829 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 1024, 1024], SPS=1196.153 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 1024, 1024], SPS=1705.147 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 1024, 1024], SPS=2248.357 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 1024, 1024], SPS=3900.332 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 2048, 2048], SPS=280.221 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 2048, 2048], SPS=351.963 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 2048, 2048], SPS=487.378 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 2048, 2048], SPS=1070.538 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 4096, 4096], SPS=65.910 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 4096, 4096], SPS=85.363 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 4096, 4096], SPS=90.826 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 4096, 4096], SPS=281.542 sps
INFO:root:gru inference benchmark.
INFO:root:For BS = 64, Layers = 1, Shape=[64, 15, 500, 500], SPS=14952.023 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 20, 500, 500], SPS=12003.757 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 500, 500], SPS=9956.502 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 30, 500, 500], SPS=8532.300 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 35, 500, 500], SPS=7446.000 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 40, 500, 500], SPS=6586.347 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 45, 500, 500], SPS=5961.260 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 50, 500, 500], SPS=5449.126 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 512, 512], SPS=4347.048 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 512, 512], SPS=6195.475 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 512, 512], SPS=8700.350 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 512, 512], SPS=11402.683 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 1024, 1024], SPS=1327.629 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 1024, 1024], SPS=2022.031 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 1024, 1024], SPS=2940.143 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 1024, 1024], SPS=3987.736 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 2048, 2048], SPS=359.937 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 2048, 2048], SPS=592.503 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 2048, 2048], SPS=862.949 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 2048, 2048], SPS=1260.849 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 4096, 4096], SPS=88.472 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 4096, 4096], SPS=151.096 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 4096, 4096], SPS=240.240 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 4096, 4096], SPS=369.249 sps

latency benchmark result:

INFO:root:lstm inference benchmark.
INFO:root:For BS = 1, Layers = 1, Shape=[1, 15, 500, 500], latency=0.003145 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 20, 500, 500], latency=0.003825 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 500, 500], latency=0.004469 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 30, 500, 500], latency=0.005054 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 35, 500, 500], latency=0.005685 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 40, 500, 500], latency=0.006307 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 45, 500, 500], latency=0.007022 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 50, 500, 500], latency=0.007478 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002655 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.003892 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.006760 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.010551 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.013230 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.018453 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.029798 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.034038 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.057398 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.090056 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.138633 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.126094 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.290858 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.410212 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.735229 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.472423 s
INFO:root:gru inference benchmark.
INFO:root:For BS = 1, Layers = 1, Shape=[1, 15, 500, 500], latency=0.001689 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 20, 500, 500], latency=0.001951 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 500, 500], latency=0.002212 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 30, 500, 500], latency=0.002472 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 35, 500, 500], latency=0.002746 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 40, 500, 500], latency=0.003000 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 45, 500, 500], latency=0.003244 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 50, 500, 500], latency=0.003519 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002322 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002309 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002331 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002314 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.006032 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.006082 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.006082 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.005999 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022420 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022164 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022387 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022163 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.100981 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.101482 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.102126 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.101685 s

Jun 17 '19 14:06 pengxin99

@kalyc @vandanavk @anirudhacharya I have addressed the review comments, could you please take a look? Thanks.

Jun 17 '19 14:06 pengxin99

@mxnet-bot run ci [all]

Aug 14 '20 05:08 szha

Jenkins CI successfully triggered : [website, unix-gpu, windows-gpu, sanity, centos-gpu, clang, edge, miscellaneous, unix-cpu, centos-cpu, windows-cpu]

Aug 14 '20 05:08 mxnet-bot

mxnet mxnet copied to clipboard

Dummy-data based benchmarking script for Gluon RNN-API

Description

Checklist

Essentials

Changes

Comments

mxnet
mxnet copied to clipboard