mxnet
mxnet copied to clipboard
Dummy-data based benchmarking script for Gluon RNN-API
Description
To easily evaluate the overall status on the performance diffs between the fused and unfused/stacked achieved on Gluon RNN API, this script can measure the dummy-data based benchmarking on LSTM and GRU RNN cell, the input-shape can either be explictly specified, or a predefined input-shape list. (The selected input shaped referring to the Deepbench).
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
- [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
- [x] Changes are complete (i.e. I finished coding on this PR)
- [x] All changes have test coverage:
- Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
- Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
- Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
- [ ] Code is well-documented:
- For user-facing API changes, API doc string has been updated.
- For new C++ functions in header files, their functionalities and arguments are documented.
- For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
- Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
- [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
Changes
Add a standalone python file benchmark_gluon_rnn.py under benchmark/python/gluon
Comments
This script supports both gpu and cpu.
@sandeep-krishnamurthy Will you review this PR? Thanks.
@mxnet-label-bot [pr-awaiting-review]
@sandeep-krishnamurthy Will you review this script? Thanks!
@sandeep-krishnamurthy Sorry for the late response, I have revised the PR per your comments. Could you please take a look? Thanks.
@sandeep-krishnamurthy It seems that your comments has been addressed. Could you please take a look?
@sandeep-krishnamurthy Thanks for your time on code review, may I know if you get a chance to review my further modification? Thanks.
@juliusshufan - Thanks for addressing the comments.
@szha - I am not sure if this will be useful for users. Can you please suggest/review? Thanks.
@szha May I know if you get a chance to review this PR, thanks.
Ping @juliusshufan ! Could you please address the review comments? We are looking forward to merging your PR.
@kalyc Thanks for reminding, I am working on addressing the review comments and will keep this PR posted.
@mxnet-label-bot update [pr-work-in-progress]
@juliusshufan - Can you please address review comments and look at failed CI?
@juliusshufan can you address the comments and CI test failures
@nswamy @sandeep-krishnamurthy @anirudh2290 - Please consider closing this PR since there is no follow up from the author since November.
@mxnet-label-bot add[pr-awaiting-response]
@pinaraws sorry for long time no response, could you please keep it open for a while, I'll follow it up asap
@juliusshufan Were you able to complete this PR ?
@juliusshufan Were you able to complete this PR ?
@pengxin99 will keep on following this PR completion, thanks.
@roywei thanks for you review, and I test our script on 8180-1s with 28 cores, below is the results:
inference benchmark results:
INFO:root:lstm inference benchmark.
INFO:root:For BS = 64, Layers = 1, Shape=[64, 15, 500, 500], SPS=20993.399 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 20, 500, 500], SPS=17156.602 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 500, 500], SPS=14634.234 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 30, 500, 500], SPS=13103.086 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 35, 500, 500], SPS=11681.630 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 40, 500, 500], SPS=10475.330 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 45, 500, 500], SPS=9504.317 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 50, 500, 500], SPS=8842.211 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 512, 512], SPS=6095.115 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 512, 512], SPS=8296.429 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 512, 512], SPS=10112.136 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 512, 512], SPS=12306.829 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 1024, 1024], SPS=1196.153 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 1024, 1024], SPS=1705.147 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 1024, 1024], SPS=2248.357 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 1024, 1024], SPS=3900.332 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 2048, 2048], SPS=280.221 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 2048, 2048], SPS=351.963 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 2048, 2048], SPS=487.378 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 2048, 2048], SPS=1070.538 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 4096, 4096], SPS=65.910 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 4096, 4096], SPS=85.363 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 4096, 4096], SPS=90.826 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 4096, 4096], SPS=281.542 sps
INFO:root:gru inference benchmark.
INFO:root:For BS = 64, Layers = 1, Shape=[64, 15, 500, 500], SPS=14952.023 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 20, 500, 500], SPS=12003.757 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 500, 500], SPS=9956.502 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 30, 500, 500], SPS=8532.300 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 35, 500, 500], SPS=7446.000 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 40, 500, 500], SPS=6586.347 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 45, 500, 500], SPS=5961.260 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 50, 500, 500], SPS=5449.126 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 512, 512], SPS=4347.048 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 512, 512], SPS=6195.475 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 512, 512], SPS=8700.350 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 512, 512], SPS=11402.683 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 1024, 1024], SPS=1327.629 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 1024, 1024], SPS=2022.031 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 1024, 1024], SPS=2940.143 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 1024, 1024], SPS=3987.736 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 2048, 2048], SPS=359.937 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 2048, 2048], SPS=592.503 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 2048, 2048], SPS=862.949 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 2048, 2048], SPS=1260.849 sps
INFO:root:For BS = 16, Layers = 1, Shape=[16, 25, 4096, 4096], SPS=88.472 sps
INFO:root:For BS = 32, Layers = 1, Shape=[32, 25, 4096, 4096], SPS=151.096 sps
INFO:root:For BS = 64, Layers = 1, Shape=[64, 25, 4096, 4096], SPS=240.240 sps
INFO:root:For BS = 128, Layers = 1, Shape=[128, 25, 4096, 4096], SPS=369.249 sps
latency benchmark result:
INFO:root:lstm inference benchmark.
INFO:root:For BS = 1, Layers = 1, Shape=[1, 15, 500, 500], latency=0.003145 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 20, 500, 500], latency=0.003825 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 500, 500], latency=0.004469 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 30, 500, 500], latency=0.005054 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 35, 500, 500], latency=0.005685 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 40, 500, 500], latency=0.006307 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 45, 500, 500], latency=0.007022 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 50, 500, 500], latency=0.007478 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002655 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.003892 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.006760 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.010551 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.013230 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.018453 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.029798 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.034038 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.057398 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.090056 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.138633 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.126094 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.290858 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.410212 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.735229 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.472423 s
INFO:root:gru inference benchmark.
INFO:root:For BS = 1, Layers = 1, Shape=[1, 15, 500, 500], latency=0.001689 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 20, 500, 500], latency=0.001951 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 500, 500], latency=0.002212 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 30, 500, 500], latency=0.002472 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 35, 500, 500], latency=0.002746 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 40, 500, 500], latency=0.003000 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 45, 500, 500], latency=0.003244 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 50, 500, 500], latency=0.003519 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002322 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002309 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002331 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 512, 512], latency=0.002314 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.006032 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.006082 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.006082 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 1024, 1024], latency=0.005999 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022420 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022164 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022387 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 2048, 2048], latency=0.022163 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.100981 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.101482 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.102126 s
INFO:root:For BS = 1, Layers = 1, Shape=[1, 25, 4096, 4096], latency=0.101685 s
@kalyc @vandanavk @anirudhacharya I have addressed the review comments, could you please take a look? Thanks.
@mxnet-bot run ci [all]
Jenkins CI successfully triggered : [website, unix-gpu, windows-gpu, sanity, centos-gpu, clang, edge, miscellaneous, unix-cpu, centos-cpu, windows-cpu]