tvm
tvm copied to clipboard
[Bug][Metaschedule] Tuning trial hanging after
I encountered this when trying to run this script over RPC on machines with v100's. Though it was done using Relax, @zxybazh says he thinks this can probably be triggered on mainline as well.
I ran ResNet-50 on V100 with an input shape of (1, 3, 224, 224), using 5 tuning trials. The tuning task began started hanging on the first tuning task, fused_conv2d_add_relu
. It appeared that there were failures encountered during the task.
Output from the host:
input_name: input0
input_shape: [1, 3, 224, 224]
input_dtype: float32
/home/ubuntu/tvm-runtime/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
warnings.warn(
INFO:tvm.meta_schedule.runner.rpc_runner:RPCRunner: max_workers = 2
INFO:tvm.meta_schedule.tune:Working directory: /home/ubuntu/dump/
2022-08-05 12:13:55.897 INFO Logging directory: /home/ubuntu/dump/logs
2022-08-05 12:13:55.897 INFO Working directory: /home/ubuntu/dump/
2022-08-05 12:13:55.898 INFO Creating JSONDatabase. Workload at: /home/ubuntu/dump/database_workload.json. Tuning records at: /home/ubuntu/dump/database_tuning_record.json
2022-08-05 12:13:56.063 INFO LocalBuilder: max_workers = 24
2022-08-05 12:13:56.388 INFO Initializing Task #0: "layout_transform"
2022-08-05 12:13:56.459 INFO Initializing Task #1: "fused_conv2d_add_relu"
2022-08-05 12:13:56.726 INFO Initializing Task #2: "max_pool2d"
2022-08-05 12:13:56.866 INFO Initializing Task #3: "fused_conv2d1_add1_relu1"
2022-08-05 12:13:57.114 INFO Initializing Task #4: "fused_contrib_conv2d_winograd_without_weight_transform_add1_relu1"
2022-08-05 12:13:58.024 INFO Initializing Task #5: "fused_conv2d2_add2"
2022-08-05 12:13:58.231 INFO Initializing Task #6: "fused_conv2d2_add2_add3_relu2"
2022-08-05 12:13:58.532 INFO Initializing Task #7: "fused_conv2d3_add1_relu1"
2022-08-05 12:13:58.784 INFO Initializing Task #8: "fused_conv2d4_add4_relu3"
2022-08-05 12:13:59.033 INFO Initializing Task #9: "fused_conv2d5_add5_relu4"
2022-08-05 12:13:59.301 INFO Initializing Task #10: "fused_conv2d7_add6"
2022-08-05 12:13:59.518 INFO Initializing Task #11: "fused_conv2d6_add6_add7_relu5"
2022-08-05 12:13:59.823 INFO Initializing Task #12: "fused_conv2d8_add5_relu4"
2022-08-05 12:14:00.077 INFO Initializing Task #13: "fused_contrib_conv2d_winograd_without_weight_transform1_add5_relu4"
2022-08-05 12:14:00.771 INFO Initializing Task #14: "fused_conv2d9_add8_relu6"
2022-08-05 12:14:01.022 INFO Initializing Task #15: "fused_conv2d10_add9_relu7"
2022-08-05 12:14:01.290 INFO Initializing Task #16: "fused_conv2d12_add10"
2022-08-05 12:14:01.504 INFO Initializing Task #17: "fused_conv2d11_add10_add11_relu8"
2022-08-05 12:14:01.806 INFO Initializing Task #18: "fused_conv2d13_add9_relu7"
2022-08-05 12:14:02.057 INFO Initializing Task #19: "fused_contrib_conv2d_winograd_without_weight_transform2_add9_relu7"
2022-08-05 12:14:02.753 INFO Initializing Task #20: "fused_conv2d14_add12_relu9"
2022-08-05 12:14:03.003 INFO Initializing Task #21: "fused_conv2d15_add13_relu10"
2022-08-05 12:14:03.272 INFO Initializing Task #22: "fused_conv2d17_add14"
2022-08-05 12:14:03.486 INFO Initializing Task #23: "fused_conv2d16_add14_add15_relu11"
2022-08-05 12:14:03.788 INFO Initializing Task #24: "fused_conv2d18_add13_relu10"
2022-08-05 12:14:04.039 INFO Initializing Task #25: "fused_contrib_conv2d_winograd_without_weight_transform3_add13_relu10"
2022-08-05 12:14:04.739 INFO Initializing Task #26: "adaptive_avg_pool2d"
2022-08-05 12:14:04.865 INFO Initializing Task #27: "fused_layout_transform1_reshape_squeeze"
2022-08-05 12:14:05.006 INFO Initializing Task #28: "fused_dense_add16"
2022-08-05 12:14:05.113 INFO
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | layout_transform | 1 | 1 | N/A | N/A | N/A | 0 |
1 | fused_conv2d_add_relu | 237633536 | 1 | N/A | N/A | N/A | 0 |
2 | max_pool2d | 1806336 | 1 | N/A | N/A | N/A | 0 |
3 | fused_conv2d1_add1_relu1 | 26091520 | 1 | N/A | N/A | N/A | 0 |
4 | fused_contrib_conv2d_winograd_without_weight_transform_add1_relu1 | 128651264 | 3 | N/A | N/A | N/A | 0 |
5 | fused_conv2d2_add2 | 103563264 | 1 | N/A | N/A | N/A | 0 |
6 | fused_conv2d2_add2_add3_relu2 | 105168896 | 3 | N/A | N/A | N/A | 0 |
7 | fused_conv2d3_add1_relu1 | 103161856 | 2 | N/A | N/A | N/A | 0 |
8 | fused_conv2d4_add4_relu3 | 206323712 | 1 | N/A | N/A | N/A | 0 |
9 | fused_conv2d5_add5_relu4 | 231411712 | 1 | N/A | N/A | N/A | 0 |
10 | fused_conv2d7_add6 | 205922304 | 1 | N/A | N/A | N/A | 0 |
11 | fused_conv2d6_add6_add7_relu5 | 103964672 | 4 | N/A | N/A | N/A | 0 |
12 | fused_conv2d8_add5_relu4 | 102961152 | 3 | N/A | N/A | N/A | 0 |
13 | fused_contrib_conv2d_winograd_without_weight_transform1_add5_relu4 | 127045632 | 3 | N/A | N/A | N/A | 0 |
14 | fused_conv2d9_add8_relu6 | 205922304 | 1 | N/A | N/A | N/A | 0 |
15 | fused_conv2d10_add9_relu7 | 231311360 | 1 | N/A | N/A | N/A | 0 |
16 | fused_conv2d12_add10 | 205721600 | 1 | N/A | N/A | N/A | 0 |
17 | fused_conv2d11_add10_add11_relu8 | 103362560 | 6 | N/A | N/A | N/A | 0 |
18 | fused_conv2d13_add9_relu7 | 102860800 | 5 | N/A | N/A | N/A | 0 |
19 | fused_contrib_conv2d_winograd_without_weight_transform2_add9_relu7 | 114903040 | 5 | N/A | N/A | N/A | 0 |
20 | fused_conv2d14_add12_relu9 | 205721600 | 1 | N/A | N/A | N/A | 0 |
21 | fused_conv2d15_add13_relu10 | 231261184 | 1 | N/A | N/A | N/A | 0 |
22 | fused_conv2d17_add14 | 205621248 | 1 | N/A | N/A | N/A | 0 |
23 | fused_conv2d16_add14_add15_relu11 | 103061504 | 3 | N/A | N/A | N/A | 0 |
24 | fused_conv2d18_add13_relu10 | 102810624 | 2 | N/A | N/A | N/A | 0 |
25 | fused_contrib_conv2d_winograd_without_weight_transform3_add13_relu10 | 142132224 | 2 | N/A | N/A | N/A | 0 |
26 | adaptive_avg_pool2d | 102400 | 1 | N/A | N/A | N/A | 0 |
27 | fused_layout_transform1_reshape_squeeze | 1 | 1 | N/A | N/A | N/A | 0 |
28 | fused_dense_add16 | 4097000 | 1 | N/A | N/A | N/A | 0 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2022-08-05 12:14:05.114 INFO Scheduler picks Task #0: "layout_transform"
2022-08-05 12:14:06.380 INFO Sending 6 sample(s) to builder
2022-08-05 12:14:06.713 INFO Sending 6 sample(s) to runner
2022-08-05 12:14:06.713 INFO Scheduler picks Task #1: "fused_conv2d_add_relu"
The tail of the long of task 1 (excerpted, as it goes on for a long time):
[etc]
2022-08-05 12:36:14.188 INFO Sample-Init-Population summary:
Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1685504 failure(s)
2022-08-05 12:36:15.803 INFO Sample-Init-Population summary:
Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1687552 failure(s)
2022-08-05 12:36:17.411 INFO Sample-Init-Population summary:
Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1689600 failure(s)