iree Integrate torch-mlir@ec6d7aa onnx.resize op

Unsloved iree issue: ONNX "resize" op test failures #17345

One torch-mlir commit before: Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

Discord discussion: https://discord.com/channels/973663919757492264/1238540944383541330

related torch-mlir onnx.resize patch: https://github.com/llvm/torch-mlir/pull/3013, author: https://github.com/aldesilv

May 12 '24 23:05 AmosLewis

Please follow https://iree.dev/developers/general/contributing/#obtaining-commit-access to get at least triage access to this repository so workflows can run without approval.

May 13 '24 15:05 ScottTodd

Updating the XFAIL lists here is going to be a bit bumpy, since I've had to turn off the main runners used: https://github.com/iree-org/iree/issues/17370 and there is a new CUDA hang.

We'll only be testing CUDA and Vulkan until the w7900 runner is back (https://github.com/iree-org/iree/pull/17375 will bring CPU testing back).
I'm trying to make debugging the CUDA hang easier: https://github.com/iree-org/iree/actions/runs/9055030697/job/24911366379#step:9:45, but for now you may just be able to skip test_resize_downsample_scales_linear and hope that is enough here: https://github.com/iree-org/iree/blob/2a701d5b490b5ac4e41c1649d31b80b45285ce2c/build_tools/pkgci/external_test_suite/onnx_gpu_cuda.json#L13-L17

Can you at least sync this PR to include the newly disabled jobs?

May 13 '24 17:05 ScottTodd

Pushed a commit syncing this PR after a few of my fixes to the CI landed. Hopefully that should show the new test outcomes (passes/failures) and timeouts. We'll have to update the ROCm tests later - once the w7900 runner is back online and stable.

May 13 '24 23:05 ScottTodd

Ok, the tests that hang can be spotted easily now. Logs from this PR: https://github.com/iree-org/iree/actions/runs/9071387883/job/24925163523?pr=17358#step:9:3466

Note the Failed: Timeout >30.0s lines:

PASSED SHARK-TestSuite/iree_tests/onnx/node/generated/test_xor_bcast4v4d/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear_align_corners/model.mlir::gpu_cuda_t4_test - Failed: Timeout >30.0s
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear_half_pixel_symmetric/model.mlir::gpu_cuda_t4_test - Failed: Timeout >30.0s
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_sizes_linear_pytorch_half_pixel/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear_align_corners/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear_half_pixel_symmetric/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_sizes_nearest_floor_align_corners/model.mlir::gpu_cuda_t4_test
============ 8 failed, 581 passed, 643 xfailed in 292.09s (0:04:52) ============

So to update xfail lists:

Download .json files from the summary page: https://github.com/iree-org/iree/actions/runs/9071387883?pr=17358
Move those files to https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite
** new ** edit those files manually (just the CUDA one in this case), putting any of the timeout tests in skip_run_tests, not expected_run_failures

May 14 '24 00:05 ScottTodd