benchmark Optimize the performance of seq2seq model on GPU

初始性能

测试时间：2019年8月8日
测试者：@Xreki

Aug 09 '19 03:08 Xreki

------------------------->

Note! This Report merge all thread Place: All Time unit: ms Sorted by total time in descending

Event recurrent_grad                         20 recurrent                              20 elementwise_add_grad matmul_grad matmul sum GpuMemcpyAsync(same_gpu):GPU->GPU      18475 elementwise_mul_grad elementwise_mul elementwise_add concat rnn_memory_helper_grad rnn_memory_helper concat_grad sigmoid_grad sigmoid elementwise_sub tanh_grad split elementwise_sub_grad tanh reshape2 dropout dropout_grad fill_constant GpuMemcpyAsync:CPU->GPU softmax                                433 transpose2_grad                        906 unsqueeze2                             433 transpose2                             916 softmax_grad                           433 squeeze2_grad                          433 eager_deletion                         870 squeeze2                               433 unsqueeze2_grad                        433 softmax_with_cross_entropy             10 adam                                   130 reduce_sum                             140 square                                 130 scale                                  260 lookup_table_grad                      20 softmax_with_cross_entropy_grad        10 sequence_mask                          30 slice_grad                             80 slice                                  110 lookup_table                           20 fill_constant_batch_size_like          50 TensorCopy:GPU->CPU GpuMemcpySync:GPU->CPU TensorCopy:CPU->GPU Fetch                                  10 GpuMemcpySync:CPU->GPU reduce_sum_grad                        10 reduce_mean                            10 elementwise_max                        10 reshape2_grad                          30 GpuMemcpyAsync:GPU->CPU FastThreadedSSAGraphExecutorPrepare    10 elementwise_div                        10 reduce_mean_grad                       10 sqrt                                   10 Scale LossGrad                         10 shape                                  30 TensorCopy:GPU->GPU

Profile结果

Profiling Report <------------------------- info into one. order in the same thread Calls Total CPU Time (Ratio) GPU Time (Ratio) Min. Max. Ave. Ratio. 3334.05 3130.058004 (0.938815) 203.995332 (0.061185) 27.1338 260.926 166.703 0.308526 1497.97 1452.955293 (0.969947) 45.018247 (0.030053) 16.0935 110.292 74.8987 0.138619 4795 1377.74 168.491482 (0.122296) 1209.244141 (0.877704) 0.026835 1.0058 0.287328 0.127493 3196 844.682 311.706298 (0.369022) 532.976110 (0.630978) 0.087976 13.3952 0.264294 0.0781651 3196 526.856 173.302980 (0.328938) 353.553345 (0.671062) 0.036523 8.69278 0.164849 0.0487542 12389 503.771 391.555491 (0.777248) 112.215826 (0.222752) 0.021764 6.88414 0.0406628 0.0466179 334.162 303.616585 (0.908592) 30.544991 (0.091408) 0.013014 0.120487 0.0180872 0.0309226 6734 256.62 238.810646 (0.930602) 17.808921 (0.069398) 0.027794 7.31021 0.038108 0.023747 6864 203.549 187.715197 (0.922212) 15.833627 (0.077788) 0.020651 1.87285 0.0296545 0.018836 4795 164.711 153.425376 (0.931485) 11.285194 (0.068515) 0.021858 1.79897 0.0343505 0.015242 3794 164.522 137.125469 (0.833479) 27.396308 (0.166521) 0.029323 0.125735 0.0433637 0.0152245 5244 140.187 140.106174 (0.999421) 0.081113 (0.000579) 0.020427 0.150314 0.0267329 0.0129726 5244 134.466 134.422311 (0.999676) 0.043546 (0.000324) 0.02023 0.131774 0.0256418 0.0124432 2340 114.213 102.730713 (0.899469) 11.481954 (0.100531) 0.032481 0.15419 0.0488088 0.010569 4362 113.874 107.131844 (0.940796) 6.741797 (0.059204) 0.020312 0.086206 0.0261058 0.0105376 4362 108.18 101.646670 (0.939606) 6.533444 (0.060394) 0.019966 0.132063 0.0248006 0.0100108 2362 79.7406 75.840477 (0.951089) 3.900169 (0.048911) 0.027971 0.095837 0.0337598 0.00737903 2908 76.4149 71.989046 (0.942082) 4.425817 (0.057918) 0.021056 0.090353 0.0262775 0.00707127 1454 75.8946 60.162568 (0.792712) 15.732015 (0.207288) 0.043571 0.630367 0.0521971 0.00702312 2352 71.542 68.192457 (0.953180) 3.349583 (0.046820) 0.022459 0.168436 0.0304175 0.00662035 2908 67.9909 63.220004 (0.929830) 4.770911 (0.070170) 0.019094 0.128054 0.0233806 0.00629174 1762 67.5811 67.563221 (0.999736) 0.017841 (0.000264) 0.012027 0.104218 0.0383547 0.00625381 1454 67.5326 52.996526 (0.784755) 14.536027 (0.215245) 0.038525 0.110662 0.046446 0.00624932 1454 50.5008 47.855241 (0.947613) 2.645600 (0.052387) 0.027463 0.624477 0.0347324 0.00467324 1496 41.2608 38.647031 (0.936653) 2.613757 (0.063347) 0.019868 0.110909 0.0275807 0.00381819 2684 38.2639 33.139905 (0.866089) 5.123964 (0.133911) 0.008246 0.077719 0.0142563 0.00354086 37.6717 32.780017 (0.870150) 4.891664 (0.129850) 0.075404 0.126331 0.0870016 0.00348606 29.4507 26.412712 (0.896845) 3.037988 (0.103155) 0.023196 0.156133 0.0325063 0.00272531 29.4025 29.319538 (0.997179) 0.082947 (0.002821) 0.059813 0.128764 0.0679041 0.00272084 28.7084 25.577694 (0.890949) 3.130678 (0.109051) 0.023465 0.092804 0.031341 0.00265661 27.5446 23.813278 (0.864535) 3.731316 (0.135465) 0.05221 0.123997 0.0636134 0.00254892 27.0135 26.944857 (0.997458) 0.068676 (0.002542) 0.054414 0.133838 0.0623869 0.00249978 26.7456 26.736183 (0.999646) 0.009456 (0.000354) 0.002324 1.38676 0.0307421 0.00247499 25.2393 25.181926 (0.997725) 0.057410 (0.002275) 0.050768 0.135624 0.0582895 0.00233559 25.0274 24.968753 (0.997655) 0.058683 (0.002345) 0.050575 0.119195 0.0578001 0.00231599 18.02 2.409284 (0.133700) 15.610725 (0.866300) 0.722111 2.15961 1.802 0.00166753 14.9652 5.707370 (0.381375) 9.257877 (0.618625) 0.034774 0.354246 0.115117 0.00138485 10.8105 8.748800 (0.809285) 2.061724 (0.190715) 0.037435 0.162729 0.077218 0.00100038 6.64256 3.823993 (0.575681) 2.818564 (0.424319) 0.022074 0.138738 0.0510966 0.000614688 6.51412 6.142491 (0.942950) 0.371628 (0.057050) 0.019617 0.263 0.0250543 0.000602803 5.0277 1.516919 (0.301712) 3.510781 (0.698288) 0.131618 0.334729 0.251385 0.000465253 5.01725 0.894024 (0.178190) 4.123227 (0.821810) 0.222553 0.583161 0.501725 0.000464286 4.32902 4.122014 (0.952182) 0.207004 (0.047818) 0.109838 0.224241 0.144301 0.000400598 3.75367 3.079189 (0.820314) 0.674482 (0.179686) 0.028986 0.145544 0.0469209 0.000347357 3.24341 3.036975 (0.936351) 0.206440 (0.063649) 0.012624 0.076516 0.0294856 0.000300139 3.21076 0.940230 (0.292837) 2.270531 (0.707163) 0.056535 0.242628 0.160538 0.000297117 1.41506 1.316025 (0.930012) 0.099038 (0.069988) 0.023255 0.042394 0.0283013 0.000130947 30 1.08744 1.044846 (0.960830) 0.042595 (0.039170) 0.03174 0.046723 0.036248 0.000100629 30 0.990879 0.899023 (0.907298) 0.091856 (0.092702) 0.029195 0.04386 0.0330293 9.16939e-05 30 0.986733 0.953919 (0.966745) 0.032814 (0.033255) 0.027728 0.046333 0.0328911 9.13102e-05 0.946421 0.866478 (0.915531) 0.079943 (0.084469) 0.079943 0.133364 0.0946421 8.75798e-05 30 0.904039 0.824342 (0.911843) 0.079697 (0.088157) 0.024434 0.042919 0.0301346 8.36579e-05 0.671832 0.578052 (0.860412) 0.093780 (0.139588) 0.063234 0.076912 0.0671832 6.21699e-05 0.660142 0.576509 (0.873311) 0.083633 (0.126689) 0.054697 0.114415 0.0660142 6.10882e-05 0.558056 0.495460 (0.887832) 0.062596 (0.112168) 0.047459 0.07431 0.0558056 5.16413e-05 0.51679 0.500414 (0.968312) 0.016376 (0.031688) 0.013623 0.02873 0.0172263 4.78227e-05 10 0.466456 0.409467 (0.877826) 0.056989 (0.122174) 0.039485 0.063719 0.0466456 4.31649e-05 0.453106 0.396860 (0.875866) 0.056246 (0.124134) 0.039105 0.056246 0.0453106 4.19295e-05 0.446607 0.392265 (0.878323) 0.054342 (0.121677) 0.041335 0.060296 0.0446607 4.13281e-05 0.436627 0.371762 (0.851441) 0.064865 (0.148559) 0.040098 0.056068 0.0436627 4.04045e-05 0.428603 0.366501 (0.855106) 0.062102 (0.144894) 0.034099 0.062782 0.0428603 3.9662e-05 0.371738 0.334618 (0.900145) 0.037120 (0.099855) 0.032202 0.048616 0.0371738 3.43999e-05 0.36791 0.342755 (0.931627) 0.025155 (0.068373) 0.008387 0.028157 0.0122637 3.40456e-05 30 0.06052 0.058630 (0.968771) 0.001890 (0.031229) 0.001675 0.002818 0.00201733 5.60039e-06

Aug 09 '19 03:08 Xreki