tvm
tvm copied to clipboard
[Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule
This PR intends to update the custom callback function of xgboost in meta schedule.
This change is tested against xgboost==(1.2.0, 1.5.2 & 1.6.0) to ensure backwards compatibility on tests/python/unittest/test_meta_schedule_cost_model.py
.
This is related to the second action item in #12009.
cc: @zxybazh @junrushao1994
Local integration test for resnet18/llvm:
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_nn_conv2d_add | 12870144 | 1 | 375.4210 | 34.2819 | 34.2819 | 32 | Y
1 | fused_nn_conv2d_add_1 | 12895232 | 1 | 398.5375 | 32.3564 | 32.3564 | 32 | Y
2 | fused_nn_conv2d_add_2 | 12945408 | 1 | 464.8020 | 27.8514 | 27.8514 | 32 | Y
3 | fused_layout_transform | 1 | 1 | 0.0002 | 5.7608 | 5.7608 | 2 | Y
4 | fused_nn_conv2d_add_nn_relu | 237633536 | 1 | 387.8015 | 612.7711 | 612.7711 | 32 | Y
5 | fused_nn_max_pool2d | 1806336 | 1 | 157.2717 | 11.4854 | 11.4854 | 32 | Y
6 | fused_nn_conv2d_add_nn_relu_1 | 231612416 | 2 | 383.7106 | 603.6122 | 1207.2245 | 32 | Y
7 | fused_nn_conv2d_add_add_nn_relu | 231813120 | 2 | 442.1804 | 524.2501 | 1048.5002 | 32 | Y
8 | fused_nn_conv2d_add_nn_relu_2 | 115806208 | 1 | 362.1544 | 319.7703 | 319.7703 | 32 | Y
9 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu | 93227008 | 1 | 293.8712 | 317.2377 | 317.2377 | 32 | Y
10 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu | 93327360 | 2 | 281.1145 | 331.9906 | 663.9812 | 32 | Y
11 | fused_nn_conv2d_add_nn_relu_3 | 115705856 | 1 | 437.5283 | 264.4534 | 264.4534 | 32 | Y
12 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 | 98600960 | 1 | 330.2098 | 298.6010 | 298.6010 | 32 | Y
13 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1 | 98651136 | 2 | 298.1799 | 330.8444 | 661.6887 | 32 | Y
14 | fused_nn_conv2d_add_nn_relu_4 | 115655680 | 1 | 381.0609 | 303.5097 | 303.5097 | 32 | Y
15 | fused_nn_conv2d_add_nn_relu_5 | 231261184 | 1 | 408.4514 | 566.1902 | 566.1902 | 32 | Y
16 | fused_nn_conv2d_add_add_nn_relu_1 | 231286272 | 2 | 332.2502 | 696.1209 | 1392.2417 | 32 | Y
17 | fused_nn_adaptive_avg_pool2d | 25600 | 1 | 5.7029 | 4.4890 | 4.4890 | 32 | Y
18 | fused_layout_transform_reshape_squeeze | 1 | 1 | 0.0003 | 3.6907 | 3.6907 | 1 |
19 | fused_nn_dense_add | 1025000 | 1 | 161.2829 | 6.3553 | 6.3553 | 32 | Y
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Profiler table:
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 61.9612 | 100.0000
1 | EvoSearch/Evolve/Mutation | 30.6629 | 49.4873
2 | EvoSearch/SampleInitPopulation | 8.4259 | 13.5987
3 | SendToBuilder | 8.4254 | 13.5979
4 | EvoSearch/Evolve/Misc | 6.0477 | 9.7604
5 | EvoSearch/Evolve/PredictNormalizedScore | 3.3436 | 5.3962
6 | SendToRunner | 2.3616 | 3.8115
7 | ApplyHistoryBest | 1.5547 | 2.5091
8 | TaskExtraction | 0.4576 | 0.7386
9 | MeasureCallback/UpdateCostModel | 0.1178 | 0.1901
10 | InitializeTask | 0.1092 | 0.1762
11 | MeasureCallback/AddToDatabase | 0.0181 | 0.0292
12 | EvoSearch/PickWithEpsGreedy | 0.0152 | 0.0245
13 | EvoSearch/PickBestFromDatabase | 0.0149 | 0.0241
14 | MeasureCallback/EchoStatistics | 0.0050 | 0.0081
15 | MeasureCallback/RemoveBuildArtifact | 0.0009 | 0.0015
16 | JoinRunnerFutures | 0.0003 | 0.0005
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0000 | 0.0000
----------------------------------------------------------------------------
@tvm-bot rerun
bert base llvm:
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_take | 1 | 1 | 0.0002 | 5.2482 | 5.2482 | 1 |
1 | fused_nn_dense_add_fast_tanh | 1204224 | 1 | 114.1531 | 10.5492 | 10.5492 | 32 | Y
2 | fused_reshape_add_reshape_transpose_reshape | 49152 | 12 | 1.7456 | 28.1570 | 337.8840 | 1 | Y
3 | fused_variance | 147520 | 25 | 17.0011 | 8.6771 | 216.9272 | 32 | Y
4 | fused_mean | 49216 | 25 | 6.1650 | 7.9831 | 199.5783 | 32 | Y
5 | fused_cast_take_add | 49152 | 1 | 2.5341 | 19.3960 | 19.3960 | 2 | Y
6 | fused_reshape_add_reshape_transpose_reshape_1 | 49152 | 24 | 4.7436 | 10.3617 | 248.6808 | 1 | Y
7 | fused_reshape_divide_add | 98304 | 12 | 12.9576 | 7.5866 | 91.0392 | 2 | Y
8 | fused_nn_fast_softmax | 4374528 | 12 | 89.9165 | 48.6510 | 583.8123 | 32 | Y
9 | fused_reshape | 1 | 12 | 0.0000 | 117.9034 | 1414.8410 | 1 | Y
10 | fused_nn_batch_matmul | 6291456 | 24 | 141.6181 | 44.4255 | 1066.2123 | 32 | Y
11 | fused_reshape_transpose_reshape | 1 | 12 | 0.0000 | 29.6311 | 355.5735 | 1 | Y
12 | fused_nn_dense | 75497472 | 48 | 191.4222 | 394.4030 | 18931.3435 | 32 | Y
13 | fused_reshape_1 | 1 | 24 | 0.0001 | 10.0435 | 241.0438 | 1 | Y
14 | fused_nn_dense_1 | 301989888 | 12 | 178.7608 | 1689.3522 | 20272.2265 | 32 | Y
15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape | 15532032 | 12 | 4.0672 | 3818.8959 | 45826.7502 | 1 | Y
16 | fused_nn_dense_2 | 301989888 | 12 | 240.5859 | 1255.2267 | 15062.7200 | 32 | Y
17 | fused_reshape_add_add | 98304 | 24 | 12.5405 | 7.8389 | 188.1338 | 2 | Y
18 | fused_subtract_add_sqrt_divide_multiply_add | 196672 | 25 | 12.7443 | 15.4322 | 385.8043 | 2 | Y
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
profiler table
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 15.5728 | 100.0000
1 | MeasureCallback/UpdateCostModel | 5.2182 | 33.5082
2 | EvoSearch/Evolve/PredictNormalizedScore | 2.4700 | 15.8609
3 | EvoSearch/Evolve/Mutation | 2.1395 | 13.7387
4 | SendToRunner | 1.5999 | 10.2737
5 | EvoSearch/Evolve/Misc | 1.5694 | 10.0778
6 | SendToBuilder | 0.9345 | 6.0006
7 | EvoSearch/SampleInitPopulation | 0.7653 | 4.9146
8 | ApplyHistoryBest | 0.5334 | 3.4250
9 | TaskExtraction | 0.1634 | 1.0490
10 | InitializeTask | 0.0280 | 0.1798
11 | EvoSearch/PickWithEpsGreedy | 0.0047 | 0.0304
12 | EvoSearch/PickBestFromDatabase | 0.0036 | 0.0232
13 | MeasureCallback/AddToDatabase | 0.0019 | 0.0125
14 | MeasureCallback/RemoveBuildArtifact | 0.0004 | 0.0028
15 | MeasureCallback/EchoStatistics | 0.0002 | 0.0010
16 | JoinRunnerFutures | 0.0002 | 0.0010
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0000 | 0.0000
----------------------------------------------------------------------------
bert base cuda:
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_take | 1 | 1 | 0.0005 | 2.1319 | 2.1319 | 5 |
1 | fused_nn_dense_add_fast_tanh | 1204224 | 1 | 36.6140 | 32.8897 | 32.8897 | 32 | Y
2 | fused_reshape_add_reshape_transpose_reshape | 49152 | 12 | 13.5008 | 3.6407 | 43.6879 | 6 | Y
3 | fused_variance | 147520 | 25 | 65.9260 | 2.2377 | 55.9415 | 32 | Y
4 | fused_mean | 49216 | 25 | 21.9872 | 2.2384 | 55.9597 | 32 | Y
5 | fused_cast_take_add | 49152 | 1 | 20.9740 | 2.3435 | 2.3435 | 6 |
6 | fused_reshape_add_reshape_transpose_reshape_1 | 49152 | 24 | 20.6382 | 2.3816 | 57.1585 | 6 | Y
7 | fused_reshape_divide_add | 98304 | 12 | 43.8752 | 2.2405 | 26.8864 | 6 | Y
8 | fused_nn_fast_softmax | 4374528 | 12 | 1141.5252 | 3.8322 | 45.9861 | 32 | Y
9 | fused_reshape | 1 | 12 | 0.0005 | 2.1836 | 26.2035 | 6 | Y
10 | fused_nn_batch_matmul | 6291456 | 24 | 684.4451 | 9.1921 | 220.6093 | 32 | Y
11 | fused_reshape_transpose_reshape | 1 | 12 | 0.0005 | 2.1763 | 26.1151 | 6 | Y
12 | fused_nn_dense | 75497472 | 48 | 918.1956 | 82.2237 | 3946.7393 | 32 | Y
13 | fused_reshape_1 | 1 | 24 | 0.0005 | 2.1895 | 52.5487 | 6 | Y
14 | fused_nn_dense_1 | 301989888 | 12 | 2381.8300 | 126.7890 | 1521.4682 | 32 | Y
15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape | 15532032 | 12 | 4892.7944 | 3.1745 | 38.0936 | 6 | Y
16 | fused_nn_dense_2 | 301989888 | 12 | 1758.6493 | 171.7170 | 2060.6034 | 32 | Y
17 | fused_reshape_add_add | 98304 | 24 | 39.4395 | 2.4925 | 59.8207 | 6 | Y
18 | fused_subtract_add_sqrt_divide_multiply_add | 196672 | 25 | 72.4898 | 2.7131 | 67.8275 | 6 | Y
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
profiler table:
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 22.2403 | 100.0000
1 | EvoSearch/Evolve/PredictNormalizedScore | 9.5203 | 42.8065
2 | EvoSearch/Evolve/Mutation | 3.3615 | 15.1146
3 | SendToBuilder | 2.3562 | 10.5943
4 | EvoSearch/SampleInitPopulation | 2.3124 | 10.3975
5 | EvoSearch/Evolve/Misc | 2.1767 | 9.7870
6 | SendToRunner | 1.6900 | 7.5987
7 | ApplyHistoryBest | 0.3483 | 1.5662
8 | TaskExtraction | 0.2121 | 0.9535
9 | MeasureCallback/UpdateCostModel | 0.0500 | 0.2248
10 | EvoSearch/PickBestFromDatabase | 0.0158 | 0.0710
11 | InitializeTask | 0.0095 | 0.0429
12 | EvoSearch/PickWithEpsGreedy | 0.0069 | 0.0310
13 | MeasureCallback/AddToDatabase | 0.0029 | 0.0130
14 | MeasureCallback/RemoveBuildArtifact | 0.0008 | 0.0037
15 | MeasureCallback/EchoStatistics | 0.0006 | 0.0028
16 | JoinRunnerFutures | 0.0003 | 0.0012
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0000 | 0.0000
----------------------------------------------------------------------------
resnet18 cuda:
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_nn_conv2d_add | 12870144 | 1 | 965.5146 | 13.3298 | 13.3298 | 32 | Y
1 | fused_nn_conv2d_add_1 | 12895232 | 1 | 1330.3102 | 9.6934 | 9.6934 | 32 | Y
2 | fused_nn_conv2d_add_2 | 12945408 | 1 | 2103.2869 | 6.1548 | 6.1548 | 32 | Y
3 | fused_layout_transform | 1 | 1 | 0.0002 | 5.0254 | 5.0254 | 6 | Y
4 | fused_nn_conv2d_add_nn_relu | 237633536 | 1 | 6085.8811 | 39.0467 | 39.0467 | 32 | Y
5 | fused_nn_max_pool2d | 1806336 | 1 | 328.9316 | 5.4915 | 5.4915 | 30 | Y
6 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu | 128651264 | 2 | 2512.3525 | 51.2075 | 102.4150 | 32 | Y
7 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu | 128851968 | 2 | 1360.9069 | 94.6810 | 189.3619 | 32 | Y
8 | fused_nn_conv2d_add_nn_relu_1 | 115806208 | 1 | 2482.7300 | 46.6447 | 46.6447 | 32 | Y
9 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 | 127045632 | 1 | 3352.8175 | 37.8922 | 37.8922 | 32 | Y
10 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_1 | 127145984 | 2 | 1854.8033 | 68.5496 | 137.0992 | 32 | Y
11 | fused_nn_conv2d_add_nn_relu_2 | 115705856 | 1 | 3359.4190 | 34.4422 | 34.4422 | 32 | Y
12 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_2 | 114903040 | 1 | 2106.9193 | 54.5360 | 54.5360 | 32 | Y
13 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_2 | 114953216 | 2 | 1723.1163 | 66.7124 | 133.4248 | 32 | Y
14 | fused_nn_conv2d_add_nn_relu_3 | 115655680 | 1 | 1007.9003 | 114.7491 | 114.7491 | 32 | Y
15 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_3 | 142132224 | 1 | 1615.3274 | 87.9897 | 87.9897 | 32 | Y
16 | fused_nn_contrib_conv2d_winograd_without_weight_transform_add_add_nn_relu_3 | 142157312 | 2 | 1053.2288 | 134.9729 | 269.9457 | 32 | Y
17 | fused_nn_adaptive_avg_pool2d | 25600 | 1 | 5.8995 | 4.3393 | 4.3393 | 32 | Y
18 | fused_layout_transform_reshape_squeeze | 1 | 1 | 0.0003 | 3.2615 | 3.2615 | 5 |
19 | fused_nn_dense_add | 1025000 | 1 | 68.4000 | 14.9854 | 14.9854 | 32 | Y
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
profiler table:
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 83.8914 | 100.0000
1 | EvoSearch/Evolve/Mutation | 48.0123 | 57.2314
2 | SendToBuilder | 13.7197 | 16.3541
3 | EvoSearch/SampleInitPopulation | 7.9949 | 9.5300
4 | EvoSearch/Evolve/PredictNormalizedScore | 4.0848 | 4.8691
5 | SendToRunner | 3.6898 | 4.3983
6 | EvoSearch/Evolve/Misc | 2.7077 | 3.2277
7 | MeasureCallback/UpdateCostModel | 1.8705 | 2.2297
8 | ApplyHistoryBest | 0.7764 | 0.9254
9 | TaskExtraction | 0.5058 | 0.6030
10 | InitializeTask | 0.0267 | 0.0318
11 | MeasureCallback/AddToDatabase | 0.0142 | 0.0170
12 | EvoSearch/PickBestFromDatabase | 0.0131 | 0.0157
13 | EvoSearch/PickWithEpsGreedy | 0.0100 | 0.0119
14 | MeasureCallback/EchoStatistics | 0.0037 | 0.0044
15 | MeasureCallback/RemoveBuildArtifact | 0.0019 | 0.0023
16 | JoinRunnerFutures | 0.0005 | 0.0006
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0000 | 0.0000
----------------------------------------------------------------------------
mobilenetv2 on cuda
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
-----------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_layout_transform | 1 | 1 | 0.0004 | 2.2798 | 2.2798 | 6 |
1 | fused_nn_conv2d_add_clip | 22880256 | 1 | 3151.3187 | 7.2605 | 7.2605 | 32 | Y
2 | fused_nn_conv2d_add_clip_1 | 8429568 | 1 | 1285.2570 | 6.5587 | 6.5587 | 32 | Y
3 | fused_nn_conv2d_add | 13045760 | 1 | 2104.8376 | 6.1980 | 6.1980 | 32 | Y
4 | fused_nn_conv2d_add_clip_2 | 42147840 | 1 | 2994.8494 | 14.0734 | 14.0734 | 32 | Y
5 | fused_nn_conv2d_add_clip_3 | 6322176 | 1 | 682.4610 | 9.2638 | 9.2638 | 32 | Y
6 | fused_nn_conv2d_add_1 | 14525952 | 1 | 1936.7547 | 7.5002 | 7.5002 | 32 | Y
7 | fused_nn_conv2d_add_clip_4 | 9483264 | 1 | 1537.1009 | 6.1696 | 6.1696 | 32 | Y
8 | fused_nn_conv2d_add_add | 21826560 | 1 | 2005.0549 | 10.8858 | 10.8858 | 32 | Y
9 | fused_nn_conv2d_add_clip_5 | 23030784 | 2 | 1914.2627 | 12.0312 | 24.0623 | 32 | Y
10 | fused_nn_conv2d_add_clip_6 | 2370816 | 1 | 393.0634 | 6.0316 | 6.0316 | 32 | Y
11 | fused_nn_conv2d_add_2 | 7250432 | 1 | 917.3106 | 7.9040 | 7.9040 | 32 | Y
12 | fused_nn_conv2d_add_clip_7 | 3161088 | 2 | 262.2023 | 12.0559 | 24.1118 | 32 | Y
13 | fused_nn_conv2d_add_add_1 | 9683968 | 2 | 1061.2357 | 9.1252 | 18.2504 | 32 | Y
14 | fused_nn_conv2d_add_clip_8 | 10085376 | 3 | 737.1134 | 13.6823 | 41.0468 | 32 | Y
15 | fused_nn_conv2d_add_clip_9 | 790272 | 1 | 170.2160 | 4.6428 | 4.6428 | 32 | Y
16 | fused_nn_conv2d_add_3 | 4829440 | 1 | 957.4766 | 5.0439 | 5.0439 | 32 | Y
17 | fused_nn_conv2d_add_add_2 | 9658880 | 3 | 919.5057 | 10.5044 | 31.5133 | 32 | Y
18 | fused_nn_conv2d_add_clip_10 | 9859584 | 4 | 1410.7424 | 6.9889 | 27.9557 | 32 | Y
19 | fused_nn_conv2d_add_clip_11 | 1580544 | 4 | 361.8447 | 4.3680 | 17.4721 | 32 | Y
20 | fused_nn_conv2d_add_4 | 14469504 | 1 | 739.5858 | 19.5643 | 19.5643 | 32 | Y
21 | fused_nn_conv2d_add_clip_12 | 2370816 | 2 | 503.2051 | 4.7114 | 9.4229 | 32 | Y
22 | fused_nn_conv2d_add_add_3 | 21713664 | 2 | 1405.4021 | 15.4501 | 30.9003 | 32 | Y
23 | fused_nn_conv2d_add_clip_13 | 22014720 | 3 | 2486.7910 | 8.8527 | 26.5580 | 32 | Y
24 | fused_nn_conv2d_add_clip_14 | 592704 | 1 | 125.2444 | 4.7324 | 4.7324 | 32 | Y
25 | fused_nn_conv2d_add_5 | 9039520 | 1 | 410.3605 | 22.0282 | 22.0282 | 32 | Y
26 | fused_nn_conv2d_add_add_4 | 15068480 | 2 | 411.3220 | 36.6343 | 73.2685 | 32 | Y
27 | fused_nn_conv2d_add_clip_15 | 15193920 | 3 | 1503.2292 | 10.1075 | 30.3226 | 32 | Y
28 | fused_nn_conv2d_add_clip_16 | 987840 | 3 | 224.3443 | 4.4032 | 13.2097 | 32 | Y
29 | fused_nn_conv2d_add_6 | 30121280 | 1 | 1749.6604 | 17.2155 | 17.2155 | 32 | Y
30 | fused_nn_conv2d_add_clip_17 | 40328960 | 1 | 2609.1046 | 15.4570 | 15.4570 | 32 | Y
31 | fused_nn_adaptive_avg_pool2d | 64000 | 1 | 16.9965 | 3.7655 | 3.7655 | 32 | Y
32 | fused_layout_transform_reshape_squeeze | 1 | 1 | 0.0002 | 4.3296 | 4.3296 | 6 | Y
33 | fused_nn_dense_add | 2561000 | 1 | 66.7119 | 38.3890 | 38.3890 | 32 | Y
-----------------------------------------------------------------------------------------------------------------------------------------------
profiler table
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 82.0160 | 100.0000
1 | EvoSearch/Evolve/Mutation | 42.1468 | 51.3885
2 | SendToBuilder | 15.1365 | 18.4556
3 | EvoSearch/SampleInitPopulation | 7.9139 | 9.6492
4 | SendToRunner | 6.4504 | 7.8648
5 | EvoSearch/Evolve/PredictNormalizedScore | 2.7957 | 3.4087
6 | MeasureCallback/UpdateCostModel | 2.7350 | 3.3348
7 | EvoSearch/Evolve/Misc | 2.6240 | 3.1994
8 | ApplyHistoryBest | 1.1672 | 1.4232
9 | TaskExtraction | 0.4503 | 0.5490
10 | InitializeTask | 0.0250 | 0.0304
11 | MeasureCallback/AddToDatabase | 0.0198 | 0.0241
12 | EvoSearch/PickWithEpsGreedy | 0.0100 | 0.0122
13 | MeasureCallback/RemoveBuildArtifact | 0.0034 | 0.0041
14 | EvoSearch/PickBestFromDatabase | 0.0032 | 0.0039
15 | MeasureCallback/EchoStatistics | 0.0027 | 0.0033
16 | JoinRunnerFutures | 0.0013 | 0.0016
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0000 | 0.0000
----------------------------------------------------------------------------
bert base on llvm 20k trials:
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | fused_take | 1 | 1 | 0.0001 | 12.9686 | 12.9686 | 1 | Y
1 | fused_nn_dense_add_fast_tanh | 1204224 | 1 | 84.5479 | 14.2431 | 14.2431 | 32 | Y
2 | fused_reshape_add_reshape_transpose_reshape | 49152 | 12 | 5.3101 | 9.2562 | 111.0749 | 1 | Y
3 | fused_variance | 147520 | 25 | 21.8394 | 6.7548 | 168.8690 | 191 | Y
4 | fused_mean | 49216 | 25 | 11.7478 | 4.1894 | 104.7344 | 159 | Y
5 | fused_cast_take_add | 49152 | 1 | 3.6734 | 13.3805 | 13.3805 | 2 | Y
6 | fused_reshape_add_reshape_transpose_reshape_1 | 49152 | 24 | 0.4843 | 101.4931 | 2435.8337 | 1 | Y
7 | fused_reshape_divide_add | 98304 | 12 | 12.6803 | 7.7525 | 93.0296 | 2 | Y
8 | fused_nn_fast_softmax | 4374528 | 12 | 207.0953 | 21.1233 | 253.4791 | 288 | Y
9 | fused_reshape | 1 | 12 | 0.0001 | 12.0269 | 144.3223 | 1 | Y
10 | fused_nn_batch_matmul | 6291456 | 24 | 462.0523 | 13.6163 | 326.7919 | 384 | Y
11 | fused_reshape_transpose_reshape | 1 | 12 | 0.0000 | 66.8140 | 801.7686 | 1 | Y
12 | fused_nn_dense | 75497472 | 48 | 613.1287 | 123.1348 | 5910.4700 | 6656 |
13 | fused_reshape_1 | 1 | 24 | 0.0000 | 49.1952 | 1180.6855 | 1 | Y
14 | fused_nn_dense_1 | 301989888 | 12 | 664.1287 | 454.7159 | 5456.5913 | 6144 |
15 | fused_reshape_add_multiply_fast_erf_multiply_add_multiply_reshape | 15532032 | 12 | 32.6868 | 475.1782 | 5702.1385 | 1 | Y
16 | fused_nn_dense_2 | 301989888 | 12 | 662.0116 | 456.1701 | 5474.0410 | 6144 |
17 | fused_reshape_add_add | 98304 | 24 | 1.3333 | 73.7283 | 1769.4793 | 2 | Y
18 | fused_subtract_add_sqrt_divide_multiply_add | 196672 | 25 | 2.6162 | 75.1739 | 1879.3469 | 2 | Y
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total trials: 20013
Total latency (us): 31853.2
profiler table
ID | Name | Time (min) | Percentage
----------------------------------------------------------------------------
| Total | 359.8455 | 100.0000
1 | SendToRunner | 118.7806 | 33.0088
2 | EvoSearch/Evolve/PredictNormalizedScore | 62.0087 | 17.2320
3 | SendToBuilder | 56.9247 | 15.8192
4 | MeasureCallback/UpdateCostModel | 42.1284 | 11.7074
5 | EvoSearch/Evolve/Mutation | 40.9665 | 11.3845
6 | EvoSearch/Evolve/Misc | 21.9481 | 6.0993
7 | EvoSearch/SampleInitPopulation | 7.9898 | 2.2203
8 | EvoSearch/PickBestFromDatabase | 2.4416 | 0.6785
9 | ApplyHistoryBest | 0.5137 | 0.1428
10 | MeasureCallback/AddToDatabase | 0.1833 | 0.0509
11 | TaskExtraction | 0.1798 | 0.0500
12 | EvoSearch/PickWithEpsGreedy | 0.0540 | 0.0150
13 | MeasureCallback/RemoveBuildArtifact | 0.0453 | 0.0126
14 | InitializeTask | 0.0440 | 0.0122
15 | MeasureCallback/EchoStatistics | 0.0310 | 0.0086
16 | JoinRunnerFutures | 0.0118 | 0.0033
17 | EvoSearch/Evolve/Misc/CopyMeasuredWorkloads | 0.0116 | 0.0032
----------------------------------------------------------------------------
@zxybazh The pandas warning should be suppressed now with the last commit.
@tvm-bot rerun