tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule

Open shingjan opened this issue 2 years ago • 8 comments

This PR intends to update the custom callback function of xgboost in meta schedule.

This change is tested against xgboost==(1.2.0, 1.5.2 & 1.6.0) to ensure backwards compatibility on tests/python/unittest/test_meta_schedule_cost_model.py.

This is related to the second action item in #12009.

cc: @zxybazh @junrushao1994

shingjan avatar Jul 19 '22 23:07 shingjan

Local integration test for resnet18/llvm:

 ID |                                                                        Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                         fused_nn_conv2d_add |  12870144 |      1 |       375.4210 |      34.2819 |               34.2819 |     32 |          Y 
  1 |                                                       fused_nn_conv2d_add_1 |  12895232 |      1 |       398.5375 |      32.3564 |               32.3564 |     32 |          Y 
  2 |                                                       fused_nn_conv2d_add_2 |  12945408 |      1 |       464.8020 |      27.8514 |               27.8514 |     32 |          Y 
  3 |                                                      fused_layout_transform |         1 |      1 |         0.0002 |       5.7608 |                5.7608 |      2 |          Y 
  4 |                                                 fused_nn_conv2d_add_nn_relu | 237633536 |      1 |       387.8015 |     612.7711 |              612.7711 |     32 |          Y 
  5 |                                                         fused_nn_max_pool2d |   1806336 |      1 |       157.2717 |      11.4854 |               11.4854 |     32 |          Y 
  6 |                                               fused_nn_conv2d_add_nn_relu_1 | 231612416 |      2 |       383.7106 |     603.6122 |             1207.2245 |     32 |          Y 
  7 |                                             fused_nn_conv2d_add_add_nn_relu | 231813120 |      2 |       442.1804 |     524.2501 |             1048.5002 |     32 |          Y 
  8 |                                               fused_nn_conv2d_add_nn_relu_2 | 115806208 |      1 |       362.1544 |     319.7703 |              319.7703 |     32 |          Y 
  9 |       fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu |  93227008 |      1 |       293.8712 |     317.2377 |              317.2377 |     32 |          Y 
 10 |   fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu |  93327360 |      2 |       281.1145 |     331.9906 |              663.9812 |     32 |          Y 
 11 |                                               fused_nn_conv2d_add_nn_relu_3 | 115705856 |      1 |       437.5283 |     264.4534 |              264.4534 |     32 |          Y 
 12 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 |  98600960 |      1 |       330.2098 |     298.6010 |              298.6010 |     32 |          Y 
 13 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1 |  98651136 |      2 |       298.1799 |     330.8444 |              661.6887 |     32 |          Y 
 14 |                                               fused_nn_conv2d_add_nn_relu_4 | 115655680 |      1 |       381.0609 |     303.5097 |              303.5097 |     32 |          Y 
 15 |                                               fused_nn_conv2d_add_nn_relu_5 | 231261184 |      1 |       408.4514 |     566.1902 |              566.1902 |     32 |          Y 
 16 |                                           fused_nn_conv2d_add_add_nn_relu_1 | 231286272 |      2 |       332.2502 |     696.1209 |             1392.2417 |     32 |          Y 
 17 |                                                fused_nn_adaptive_avg_pool2d |     25600 |      1 |         5.7029 |       4.4890 |                4.4890 |     32 |          Y 
 18 |                                      fused_layout_transform_reshape_squeeze |         1 |      1 |         0.0003 |       3.6907 |                3.6907 |      1 |            
 19 |                                                          fused_nn_dense_add |   1025000 |      1 |       161.2829 |       6.3553 |                6.3553 |     32 |          Y 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Profiler table:

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    61.9612 |   100.0000 
  1 |                   EvoSearch/Evolve/Mutation |    30.6629 |    49.4873 
  2 |              EvoSearch/SampleInitPopulation |     8.4259 |    13.5987 
  3 |                               SendToBuilder |     8.4254 |    13.5979 
  4 |                       EvoSearch/Evolve/Misc |     6.0477 |     9.7604 
  5 |     EvoSearch/Evolve/PredictNormalizedScore |     3.3436 |     5.3962 
  6 |                                SendToRunner |     2.3616 |     3.8115 
  7 |                            ApplyHistoryBest |     1.5547 |     2.5091 
  8 |                              TaskExtraction |     0.4576 |     0.7386 
  9 |             MeasureCallback/UpdateCostModel |     0.1178 |     0.1901 
 10 |                              InitializeTask |     0.1092 |     0.1762 
 11 |               MeasureCallback/AddToDatabase |     0.0181 |     0.0292 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0152 |     0.0245 
 13 |              EvoSearch/PickBestFromDatabase |     0.0149 |     0.0241 
 14 |              MeasureCallback/EchoStatistics |     0.0050 |     0.0081 
 15 |         MeasureCallback/RemoveBuildArtifact |     0.0009 |     0.0015 
 16 |                           JoinRunnerFutures |     0.0003 |     0.0005 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

shingjan avatar Jul 27 '22 23:07 shingjan

@tvm-bot rerun

shingjan avatar Jul 28 '22 07:07 shingjan

bert base llvm:

 ID |                                                              Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                        fused_take |         1 |      1 |         0.0002 |       5.2482 |                5.2482 |      1 |            
  1 |                                      fused_nn_dense_add_fast_tanh |   1204224 |      1 |       114.1531 |      10.5492 |               10.5492 |     32 |          Y 
  2 |                       fused_reshape_add_reshape_transpose_reshape |     49152 |     12 |         1.7456 |      28.1570 |              337.8840 |      1 |          Y 
  3 |                                                    fused_variance |    147520 |     25 |        17.0011 |       8.6771 |              216.9272 |     32 |          Y 
  4 |                                                        fused_mean |     49216 |     25 |         6.1650 |       7.9831 |              199.5783 |     32 |          Y 
  5 |                                               fused_cast_take_add |     49152 |      1 |         2.5341 |      19.3960 |               19.3960 |      2 |          Y 
  6 |                     fused_reshape_add_reshape_transpose_reshape_1 |     49152 |     24 |         4.7436 |      10.3617 |              248.6808 |      1 |          Y 
  7 |                                          fused_reshape_divide_add |     98304 |     12 |        12.9576 |       7.5866 |               91.0392 |      2 |          Y 
  8 |                                             fused_nn_fast_softmax |   4374528 |     12 |        89.9165 |      48.6510 |              583.8123 |     32 |          Y 
  9 |                                                     fused_reshape |         1 |     12 |         0.0000 |     117.9034 |             1414.8410 |      1 |          Y 
 10 |                                             fused_nn_batch_matmul |   6291456 |     24 |       141.6181 |      44.4255 |             1066.2123 |     32 |          Y 
 11 |                                   fused_reshape_transpose_reshape |         1 |     12 |         0.0000 |      29.6311 |              355.5735 |      1 |          Y 
 12 |                                                    fused_nn_dense |  75497472 |     48 |       191.4222 |     394.4030 |            18931.3435 |     32 |          Y 
 13 |                                                   fused_reshape_1 |         1 |     24 |         0.0001 |      10.0435 |              241.0438 |      1 |          Y 
 14 |                                                  fused_nn_dense_1 | 301989888 |     12 |       178.7608 |    1689.3522 |            20272.2265 |     32 |          Y 
 15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  15532032 |     12 |         4.0672 |    3818.8959 |            45826.7502 |      1 |          Y 
 16 |                                                  fused_nn_dense_2 | 301989888 |     12 |       240.5859 |    1255.2267 |            15062.7200 |     32 |          Y 
 17 |                                             fused_reshape_add_add |     98304 |     24 |        12.5405 |       7.8389 |              188.1338 |      2 |          Y 
 18 |                       fused_subtract_add_sqrt_divide_multiply_add |    196672 |     25 |        12.7443 |      15.4322 |              385.8043 |      2 |          Y 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

profiler table

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    15.5728 |   100.0000 
  1 |             MeasureCallback/UpdateCostModel |     5.2182 |    33.5082 
  2 |     EvoSearch/Evolve/PredictNormalizedScore |     2.4700 |    15.8609 
  3 |                   EvoSearch/Evolve/Mutation |     2.1395 |    13.7387 
  4 |                                SendToRunner |     1.5999 |    10.2737 
  5 |                       EvoSearch/Evolve/Misc |     1.5694 |    10.0778 
  6 |                               SendToBuilder |     0.9345 |     6.0006 
  7 |              EvoSearch/SampleInitPopulation |     0.7653 |     4.9146 
  8 |                            ApplyHistoryBest |     0.5334 |     3.4250 
  9 |                              TaskExtraction |     0.1634 |     1.0490 
 10 |                              InitializeTask |     0.0280 |     0.1798 
 11 |                 EvoSearch/PickWithEpsGreedy |     0.0047 |     0.0304 
 12 |              EvoSearch/PickBestFromDatabase |     0.0036 |     0.0232 
 13 |               MeasureCallback/AddToDatabase |     0.0019 |     0.0125 
 14 |         MeasureCallback/RemoveBuildArtifact |     0.0004 |     0.0028 
 15 |              MeasureCallback/EchoStatistics |     0.0002 |     0.0010 
 16 |                           JoinRunnerFutures |     0.0002 |     0.0010 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

shingjan avatar Jul 28 '22 08:07 shingjan

bert base cuda:

 ID |                                                              Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                        fused_take |         1 |      1 |         0.0005 |       2.1319 |                2.1319 |      5 |            
  1 |                                      fused_nn_dense_add_fast_tanh |   1204224 |      1 |        36.6140 |      32.8897 |               32.8897 |     32 |          Y 
  2 |                       fused_reshape_add_reshape_transpose_reshape |     49152 |     12 |        13.5008 |       3.6407 |               43.6879 |      6 |          Y 
  3 |                                                    fused_variance |    147520 |     25 |        65.9260 |       2.2377 |               55.9415 |     32 |          Y 
  4 |                                                        fused_mean |     49216 |     25 |        21.9872 |       2.2384 |               55.9597 |     32 |          Y 
  5 |                                               fused_cast_take_add |     49152 |      1 |        20.9740 |       2.3435 |                2.3435 |      6 |            
  6 |                     fused_reshape_add_reshape_transpose_reshape_1 |     49152 |     24 |        20.6382 |       2.3816 |               57.1585 |      6 |          Y 
  7 |                                          fused_reshape_divide_add |     98304 |     12 |        43.8752 |       2.2405 |               26.8864 |      6 |          Y 
  8 |                                             fused_nn_fast_softmax |   4374528 |     12 |      1141.5252 |       3.8322 |               45.9861 |     32 |          Y 
  9 |                                                     fused_reshape |         1 |     12 |         0.0005 |       2.1836 |               26.2035 |      6 |          Y 
 10 |                                             fused_nn_batch_matmul |   6291456 |     24 |       684.4451 |       9.1921 |              220.6093 |     32 |          Y 
 11 |                                   fused_reshape_transpose_reshape |         1 |     12 |         0.0005 |       2.1763 |               26.1151 |      6 |          Y 
 12 |                                                    fused_nn_dense |  75497472 |     48 |       918.1956 |      82.2237 |             3946.7393 |     32 |          Y 
 13 |                                                   fused_reshape_1 |         1 |     24 |         0.0005 |       2.1895 |               52.5487 |      6 |          Y 
 14 |                                                  fused_nn_dense_1 | 301989888 |     12 |      2381.8300 |     126.7890 |             1521.4682 |     32 |          Y 
 15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  15532032 |     12 |      4892.7944 |       3.1745 |               38.0936 |      6 |          Y 
 16 |                                                  fused_nn_dense_2 | 301989888 |     12 |      1758.6493 |     171.7170 |             2060.6034 |     32 |          Y 
 17 |                                             fused_reshape_add_add |     98304 |     24 |        39.4395 |       2.4925 |               59.8207 |      6 |          Y 
 18 |                       fused_subtract_add_sqrt_divide_multiply_add |    196672 |     25 |        72.4898 |       2.7131 |               67.8275 |      6 |          Y 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

profiler table:

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    22.2403 |   100.0000 
  1 |     EvoSearch/Evolve/PredictNormalizedScore |     9.5203 |    42.8065 
  2 |                   EvoSearch/Evolve/Mutation |     3.3615 |    15.1146 
  3 |                               SendToBuilder |     2.3562 |    10.5943 
  4 |              EvoSearch/SampleInitPopulation |     2.3124 |    10.3975 
  5 |                       EvoSearch/Evolve/Misc |     2.1767 |     9.7870 
  6 |                                SendToRunner |     1.6900 |     7.5987 
  7 |                            ApplyHistoryBest |     0.3483 |     1.5662 
  8 |                              TaskExtraction |     0.2121 |     0.9535 
  9 |             MeasureCallback/UpdateCostModel |     0.0500 |     0.2248 
 10 |              EvoSearch/PickBestFromDatabase |     0.0158 |     0.0710 
 11 |                              InitializeTask |     0.0095 |     0.0429 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0069 |     0.0310 
 13 |               MeasureCallback/AddToDatabase |     0.0029 |     0.0130 
 14 |         MeasureCallback/RemoveBuildArtifact |     0.0008 |     0.0037 
 15 |              MeasureCallback/EchoStatistics |     0.0006 |     0.0028 
 16 |                           JoinRunnerFutures |     0.0003 |     0.0012 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

shingjan avatar Jul 28 '22 08:07 shingjan

resnet18 cuda:

 ID |                                                                        Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                         fused_nn_conv2d_add |  12870144 |      1 |       965.5146 |      13.3298 |               13.3298 |     32 |          Y 
  1 |                                                       fused_nn_conv2d_add_1 |  12895232 |      1 |      1330.3102 |       9.6934 |                9.6934 |     32 |          Y 
  2 |                                                       fused_nn_conv2d_add_2 |  12945408 |      1 |      2103.2869 |       6.1548 |                6.1548 |     32 |          Y 
  3 |                                                      fused_layout_transform |         1 |      1 |         0.0002 |       5.0254 |                5.0254 |      6 |          Y 
  4 |                                                 fused_nn_conv2d_add_nn_relu | 237633536 |      1 |      6085.8811 |      39.0467 |               39.0467 |     32 |          Y 
  5 |                                                         fused_nn_max_pool2d |   1806336 |      1 |       328.9316 |       5.4915 |                5.4915 |     30 |          Y 
  6 |       fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu | 128651264 |      2 |      2512.3525 |      51.2075 |              102.4150 |     32 |          Y 
  7 |   fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu | 128851968 |      2 |      1360.9069 |      94.6810 |              189.3619 |     32 |          Y 
  8 |                                               fused_nn_conv2d_add_nn_relu_1 | 115806208 |      1 |      2482.7300 |      46.6447 |               46.6447 |     32 |          Y 
  9 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 | 127045632 |      1 |      3352.8175 |      37.8922 |               37.8922 |     32 |          Y 
 10 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1 | 127145984 |      2 |      1854.8033 |      68.5496 |              137.0992 |     32 |          Y 
 11 |                                               fused_nn_conv2d_add_nn_relu_2 | 115705856 |      1 |      3359.4190 |      34.4422 |               34.4422 |     32 |          Y 
 12 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_2 | 114903040 |      1 |      2106.9193 |      54.5360 |               54.5360 |     32 |          Y 
 13 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_2 | 114953216 |      2 |      1723.1163 |      66.7124 |              133.4248 |     32 |          Y 
 14 |                                               fused_nn_conv2d_add_nn_relu_3 | 115655680 |      1 |      1007.9003 |     114.7491 |              114.7491 |     32 |          Y 
 15 |     fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_3 | 142132224 |      1 |      1615.3274 |      87.9897 |               87.9897 |     32 |          Y 
 16 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_3 | 142157312 |      2 |      1053.2288 |     134.9729 |              269.9457 |     32 |          Y 
 17 |                                                fused_nn_adaptive_avg_pool2d |     25600 |      1 |         5.8995 |       4.3393 |                4.3393 |     32 |          Y 
 18 |                                      fused_layout_transform_reshape_squeeze |         1 |      1 |         0.0003 |       3.2615 |                3.2615 |      5 |            
 19 |                                                          fused_nn_dense_add |   1025000 |      1 |        68.4000 |      14.9854 |               14.9854 |     32 |          Y 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

profiler table:

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    83.8914 |   100.0000 
  1 |                   EvoSearch/Evolve/Mutation |    48.0123 |    57.2314 
  2 |                               SendToBuilder |    13.7197 |    16.3541 
  3 |              EvoSearch/SampleInitPopulation |     7.9949 |     9.5300 
  4 |     EvoSearch/Evolve/PredictNormalizedScore |     4.0848 |     4.8691 
  5 |                                SendToRunner |     3.6898 |     4.3983 
  6 |                       EvoSearch/Evolve/Misc |     2.7077 |     3.2277 
  7 |             MeasureCallback/UpdateCostModel |     1.8705 |     2.2297 
  8 |                            ApplyHistoryBest |     0.7764 |     0.9254 
  9 |                              TaskExtraction |     0.5058 |     0.6030 
 10 |                              InitializeTask |     0.0267 |     0.0318 
 11 |               MeasureCallback/AddToDatabase |     0.0142 |     0.0170 
 12 |              EvoSearch/PickBestFromDatabase |     0.0131 |     0.0157 
 13 |                 EvoSearch/PickWithEpsGreedy |     0.0100 |     0.0119 
 14 |              MeasureCallback/EchoStatistics |     0.0037 |     0.0044 
 15 |         MeasureCallback/RemoveBuildArtifact |     0.0019 |     0.0023 
 16 |                           JoinRunnerFutures |     0.0005 |     0.0006 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

shingjan avatar Jul 28 '22 15:07 shingjan

mobilenetv2 on cuda

 ID |                                   Name |     FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
-----------------------------------------------------------------------------------------------------------------------------------------------
  0 |                 fused_layout_transform |        1 |      1 |         0.0004 |       2.2798 |                2.2798 |      6 |            
  1 |               fused_nn_conv2d_add_clip | 22880256 |      1 |      3151.3187 |       7.2605 |                7.2605 |     32 |          Y 
  2 |             fused_nn_conv2d_add_clip_1 |  8429568 |      1 |      1285.2570 |       6.5587 |                6.5587 |     32 |          Y 
  3 |                    fused_nn_conv2d_add | 13045760 |      1 |      2104.8376 |       6.1980 |                6.1980 |     32 |          Y 
  4 |             fused_nn_conv2d_add_clip_2 | 42147840 |      1 |      2994.8494 |      14.0734 |               14.0734 |     32 |          Y 
  5 |             fused_nn_conv2d_add_clip_3 |  6322176 |      1 |       682.4610 |       9.2638 |                9.2638 |     32 |          Y 
  6 |                  fused_nn_conv2d_add_1 | 14525952 |      1 |      1936.7547 |       7.5002 |                7.5002 |     32 |          Y 
  7 |             fused_nn_conv2d_add_clip_4 |  9483264 |      1 |      1537.1009 |       6.1696 |                6.1696 |     32 |          Y 
  8 |                fused_nn_conv2d_add_add | 21826560 |      1 |      2005.0549 |      10.8858 |               10.8858 |     32 |          Y 
  9 |             fused_nn_conv2d_add_clip_5 | 23030784 |      2 |      1914.2627 |      12.0312 |               24.0623 |     32 |          Y 
 10 |             fused_nn_conv2d_add_clip_6 |  2370816 |      1 |       393.0634 |       6.0316 |                6.0316 |     32 |          Y 
 11 |                  fused_nn_conv2d_add_2 |  7250432 |      1 |       917.3106 |       7.9040 |                7.9040 |     32 |          Y 
 12 |             fused_nn_conv2d_add_clip_7 |  3161088 |      2 |       262.2023 |      12.0559 |               24.1118 |     32 |          Y 
 13 |              fused_nn_conv2d_add_add_1 |  9683968 |      2 |      1061.2357 |       9.1252 |               18.2504 |     32 |          Y 
 14 |             fused_nn_conv2d_add_clip_8 | 10085376 |      3 |       737.1134 |      13.6823 |               41.0468 |     32 |          Y 
 15 |             fused_nn_conv2d_add_clip_9 |   790272 |      1 |       170.2160 |       4.6428 |                4.6428 |     32 |          Y 
 16 |                  fused_nn_conv2d_add_3 |  4829440 |      1 |       957.4766 |       5.0439 |                5.0439 |     32 |          Y 
 17 |              fused_nn_conv2d_add_add_2 |  9658880 |      3 |       919.5057 |      10.5044 |               31.5133 |     32 |          Y 
 18 |            fused_nn_conv2d_add_clip_10 |  9859584 |      4 |      1410.7424 |       6.9889 |               27.9557 |     32 |          Y 
 19 |            fused_nn_conv2d_add_clip_11 |  1580544 |      4 |       361.8447 |       4.3680 |               17.4721 |     32 |          Y 
 20 |                  fused_nn_conv2d_add_4 | 14469504 |      1 |       739.5858 |      19.5643 |               19.5643 |     32 |          Y 
 21 |            fused_nn_conv2d_add_clip_12 |  2370816 |      2 |       503.2051 |       4.7114 |                9.4229 |     32 |          Y 
 22 |              fused_nn_conv2d_add_add_3 | 21713664 |      2 |      1405.4021 |      15.4501 |               30.9003 |     32 |          Y 
 23 |            fused_nn_conv2d_add_clip_13 | 22014720 |      3 |      2486.7910 |       8.8527 |               26.5580 |     32 |          Y 
 24 |            fused_nn_conv2d_add_clip_14 |   592704 |      1 |       125.2444 |       4.7324 |                4.7324 |     32 |          Y 
 25 |                  fused_nn_conv2d_add_5 |  9039520 |      1 |       410.3605 |      22.0282 |               22.0282 |     32 |          Y 
 26 |              fused_nn_conv2d_add_add_4 | 15068480 |      2 |       411.3220 |      36.6343 |               73.2685 |     32 |          Y 
 27 |            fused_nn_conv2d_add_clip_15 | 15193920 |      3 |      1503.2292 |      10.1075 |               30.3226 |     32 |          Y 
 28 |            fused_nn_conv2d_add_clip_16 |   987840 |      3 |       224.3443 |       4.4032 |               13.2097 |     32 |          Y 
 29 |                  fused_nn_conv2d_add_6 | 30121280 |      1 |      1749.6604 |      17.2155 |               17.2155 |     32 |          Y 
 30 |            fused_nn_conv2d_add_clip_17 | 40328960 |      1 |      2609.1046 |      15.4570 |               15.4570 |     32 |          Y 
 31 |           fused_nn_adaptive_avg_pool2d |    64000 |      1 |        16.9965 |       3.7655 |                3.7655 |     32 |          Y 
 32 | fused_layout_transform_reshape_squeeze |        1 |      1 |         0.0002 |       4.3296 |                4.3296 |      6 |          Y 
 33 |                     fused_nn_dense_add |  2561000 |      1 |        66.7119 |      38.3890 |               38.3890 |     32 |          Y 
-----------------------------------------------------------------------------------------------------------------------------------------------

profiler table

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |    82.0160 |   100.0000 
  1 |                   EvoSearch/Evolve/Mutation |    42.1468 |    51.3885 
  2 |                               SendToBuilder |    15.1365 |    18.4556 
  3 |              EvoSearch/SampleInitPopulation |     7.9139 |     9.6492 
  4 |                                SendToRunner |     6.4504 |     7.8648 
  5 |     EvoSearch/Evolve/PredictNormalizedScore |     2.7957 |     3.4087 
  6 |             MeasureCallback/UpdateCostModel |     2.7350 |     3.3348 
  7 |                       EvoSearch/Evolve/Misc |     2.6240 |     3.1994 
  8 |                            ApplyHistoryBest |     1.1672 |     1.4232 
  9 |                              TaskExtraction |     0.4503 |     0.5490 
 10 |                              InitializeTask |     0.0250 |     0.0304 
 11 |               MeasureCallback/AddToDatabase |     0.0198 |     0.0241 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0100 |     0.0122 
 13 |         MeasureCallback/RemoveBuildArtifact |     0.0034 |     0.0041 
 14 |              EvoSearch/PickBestFromDatabase |     0.0032 |     0.0039 
 15 |              MeasureCallback/EchoStatistics |     0.0027 |     0.0033 
 16 |                           JoinRunnerFutures |     0.0013 |     0.0016 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0000 |     0.0000 
----------------------------------------------------------------------------

shingjan avatar Jul 28 '22 15:07 shingjan

bert base on llvm 20k trials:

 ID |                                                              Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0 |                                                        fused_take |         1 |      1 |         0.0001 |      12.9686 |               12.9686 |      1 |          Y 
  1 |                                      fused_nn_dense_add_fast_tanh |   1204224 |      1 |        84.5479 |      14.2431 |               14.2431 |     32 |          Y 
  2 |                       fused_reshape_add_reshape_transpose_reshape |     49152 |     12 |         5.3101 |       9.2562 |              111.0749 |      1 |          Y 
  3 |                                                    fused_variance |    147520 |     25 |        21.8394 |       6.7548 |              168.8690 |    191 |          Y 
  4 |                                                        fused_mean |     49216 |     25 |        11.7478 |       4.1894 |              104.7344 |    159 |          Y 
  5 |                                               fused_cast_take_add |     49152 |      1 |         3.6734 |      13.3805 |               13.3805 |      2 |          Y 
  6 |                     fused_reshape_add_reshape_transpose_reshape_1 |     49152 |     24 |         0.4843 |     101.4931 |             2435.8337 |      1 |          Y 
  7 |                                          fused_reshape_divide_add |     98304 |     12 |        12.6803 |       7.7525 |               93.0296 |      2 |          Y 
  8 |                                             fused_nn_fast_softmax |   4374528 |     12 |       207.0953 |      21.1233 |              253.4791 |    288 |          Y 
  9 |                                                     fused_reshape |         1 |     12 |         0.0001 |      12.0269 |              144.3223 |      1 |          Y 
 10 |                                             fused_nn_batch_matmul |   6291456 |     24 |       462.0523 |      13.6163 |              326.7919 |    384 |          Y 
 11 |                                   fused_reshape_transpose_reshape |         1 |     12 |         0.0000 |      66.8140 |              801.7686 |      1 |          Y 
 12 |                                                    fused_nn_dense |  75497472 |     48 |       613.1287 |     123.1348 |             5910.4700 |   6656 |            
 13 |                                                   fused_reshape_1 |         1 |     24 |         0.0000 |      49.1952 |             1180.6855 |      1 |          Y 
 14 |                                                  fused_nn_dense_1 | 301989888 |     12 |       664.1287 |     454.7159 |             5456.5913 |   6144 |            
 15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape |  15532032 |     12 |        32.6868 |     475.1782 |             5702.1385 |      1 |          Y 
 16 |                                                  fused_nn_dense_2 | 301989888 |     12 |       662.0116 |     456.1701 |             5474.0410 |   6144 |            
 17 |                                             fused_reshape_add_add |     98304 |     24 |         1.3333 |      73.7283 |             1769.4793 |      2 |          Y 
 18 |                       fused_subtract_add_sqrt_divide_multiply_add |    196672 |     25 |         2.6162 |      75.1739 |             1879.3469 |      2 |          Y 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total trials: 20013
Total latency (us): 31853.2

profiler table

 ID |                                        Name | Time (min) | Percentage 
----------------------------------------------------------------------------
    |                                       Total |   359.8455 |   100.0000 
  1 |                                SendToRunner |   118.7806 |    33.0088 
  2 |     EvoSearch/Evolve/PredictNormalizedScore |    62.0087 |    17.2320 
  3 |                               SendToBuilder |    56.9247 |    15.8192 
  4 |             MeasureCallback/UpdateCostModel |    42.1284 |    11.7074 
  5 |                   EvoSearch/Evolve/Mutation |    40.9665 |    11.3845 
  6 |                       EvoSearch/Evolve/Misc |    21.9481 |     6.0993 
  7 |              EvoSearch/SampleInitPopulation |     7.9898 |     2.2203 
  8 |              EvoSearch/PickBestFromDatabase |     2.4416 |     0.6785 
  9 |                            ApplyHistoryBest |     0.5137 |     0.1428 
 10 |               MeasureCallback/AddToDatabase |     0.1833 |     0.0509 
 11 |                              TaskExtraction |     0.1798 |     0.0500 
 12 |                 EvoSearch/PickWithEpsGreedy |     0.0540 |     0.0150 
 13 |         MeasureCallback/RemoveBuildArtifact |     0.0453 |     0.0126 
 14 |                              InitializeTask |     0.0440 |     0.0122 
 15 |              MeasureCallback/EchoStatistics |     0.0310 |     0.0086 
 16 |                           JoinRunnerFutures |     0.0118 |     0.0033 
 17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads |     0.0116 |     0.0032 
----------------------------------------------------------------------------

shingjan avatar Jul 28 '22 15:07 shingjan

@zxybazh The pandas warning should be suppressed now with the last commit.

shingjan avatar Sep 21 '22 06:09 shingjan

@tvm-bot rerun

zxybazh avatar Sep 25 '22 23:09 zxybazh