Scott Todd

Results 405 comments of Scott Todd

For https://github.com/iree-org/iree/tree/main/experimental/regression_suite/tests/pregenerated, tests are still disabled: https://github.com/iree-org/iree/blob/258707898ae4a62d53468a51dc9dc44a1a8e22e4/.github/workflows/pkgci_regression_test.yml#L190-L197 Instructions for regenerating are at https://github.com/nod-ai/SHARK-Turbine/tree/main/models/turbine_models/custom_models#instructions, but that code hasn't been touched in a while, so it might need other updates too.

> I tried to regenerate the .mlirbc files for https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests/pytorch/models/resnet50 and https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests/pytorch/models/opt-125M, but hit issues with both. Need to apply more rigor to those frontend workflows. > > * resnet50...

Please follow https://iree.dev/developers/general/contributing/#obtaining-commit-access to get at least triage access to this repository so workflows can run without approval. ![image](https://github.com/iree-org/iree/assets/4010439/c9069ff0-6a1d-4f45-9506-2cc2bd9a378b)

Updating the XFAIL lists here is going to be a bit bumpy, since I've had to turn off the main runners used: https://github.com/iree-org/iree/issues/17370 and there is a new CUDA hang....

Pushed a commit syncing this PR after a few of my fixes to the CI landed. Hopefully that should show the new test outcomes (passes/failures) and timeouts. We'll have to...

Ok, the tests that hang can be spotted easily now. Logs from this PR: https://github.com/iree-org/iree/actions/runs/9071387883/job/24925163523?pr=17358#step:9:3466 Note the `Failed: Timeout >30.0s` lines: ``` PASSED SHARK-TestSuite/iree_tests/onnx/node/generated/test_xor_bcast4v4d/model.mlir::gpu_cuda_t4_test FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear/model.mlir::gpu_cuda_t4_test FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear_align_corners/model.mlir::gpu_cuda_t4_test - Failed:...

Could run an experiment with and without tracing to quantify the cost. For Linux the code to change appears to be https://github.com/openxla/iree/blob/767a6112abffebd896ceb2f0107c0d603c2a338a/build_tools/benchmarks/run_benchmarks_on_linux.py#L87-L110

Another thing we could do would be to set `--iree-hal-executable-debug-level=3` to embed source files, since that works now (thanks to https://github.com/openxla/iree/pull/16757). Traces would be more useful with that. But yeah...

> What if we only disabled Tracy captures on PR runs? That seems reasonable to me. If someone wants to see them on presubmit they could even just edit the...

@pzread how do you feel about disabling Tracy on PR runs, or making it an extra opt-in?