Integrate torch-mlir@ec6d7aa onnx.resize op
Unsloved iree issue: ONNX "resize" op test failures #17345
One torch-mlir commit before: Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330
Discord discussion: https://discord.com/channels/973663919757492264/1238540944383541330
related torch-mlir onnx.resize patch: https://github.com/llvm/torch-mlir/pull/3013, author: https://github.com/aldesilv
Please follow https://iree.dev/developers/general/contributing/#obtaining-commit-access to get at least triage access to this repository so workflows can run without approval.
Updating the XFAIL lists here is going to be a bit bumpy, since I've had to turn off the main runners used: https://github.com/iree-org/iree/issues/17370 and there is a new CUDA hang.
- We'll only be testing CUDA and Vulkan until the w7900 runner is back (https://github.com/iree-org/iree/pull/17375 will bring CPU testing back).
- I'm trying to make debugging the CUDA hang easier: https://github.com/iree-org/iree/actions/runs/9055030697/job/24911366379#step:9:45, but for now you may just be able to skip
test_resize_downsample_scales_linearand hope that is enough here: https://github.com/iree-org/iree/blob/2a701d5b490b5ac4e41c1649d31b80b45285ce2c/build_tools/pkgci/external_test_suite/onnx_gpu_cuda.json#L13-L17
Can you at least sync this PR to include the newly disabled jobs?
Pushed a commit syncing this PR after a few of my fixes to the CI landed. Hopefully that should show the new test outcomes (passes/failures) and timeouts. We'll have to update the ROCm tests later - once the w7900 runner is back online and stable.
Ok, the tests that hang can be spotted easily now. Logs from this PR: https://github.com/iree-org/iree/actions/runs/9071387883/job/24925163523?pr=17358#step:9:3466
Note the Failed: Timeout >30.0s lines:
PASSED SHARK-TestSuite/iree_tests/onnx/node/generated/test_xor_bcast4v4d/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear_align_corners/model.mlir::gpu_cuda_t4_test - Failed: Timeout >30.0s
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear_half_pixel_symmetric/model.mlir::gpu_cuda_t4_test - Failed: Timeout >30.0s
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_sizes_linear_pytorch_half_pixel/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear_align_corners/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear_half_pixel_symmetric/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_sizes_nearest_floor_align_corners/model.mlir::gpu_cuda_t4_test
============ 8 failed, 581 passed, 643 xfailed in 292.09s (0:04:52) ============
So to update xfail lists:
- Download .json files from the summary page: https://github.com/iree-org/iree/actions/runs/9071387883?pr=17358
- Move those files to https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite
- ** new ** edit those files manually (just the CUDA one in this case), putting any of the timeout tests in
skip_run_tests, notexpected_run_failures