Finish LinalgExt operation support on all backends
One of the issues faced during SDXL support (https://github.com/openxla/iree/pull/16854) was the missing support for operations added in LinalgExt on all codegen backends i.e, CPU, SPIRV and LLVMGPU.
Main Issues
iree_linalg_ext.attentionhttps://github.com/openxla/iree/blob/2cdf1452bb2f877baf8723ab567363094bea10bd/compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.td#L514 The main issue here was that theTileAndDecomposeAttentionPassis not really tested on any end-to-end compilation path. An efficient compilation of this op was built up using transform dialect script that was custom tuned for a single architecture. So it was hard to test models that had these operations on any other hardware.iree_linalg_ext.winograd.input_transformhttps://github.com/openxla/iree/blob/2cdf1452bb2f877baf8723ab567363094bea10bd/compiler/src/iree/compiler/Dialect/LinalgExt/IR/LinalgExtOps.td#L1043 This operation was working on SPIR-V backend and CPU backend, but not on the LLVMGPU backend. Again this wasnt tested end-to-end on all backends, but it was somewhat tested on CPU and SPIR-V backends (https://github.com/openxla/iree/blob/main/tests/e2e/linalg_ext_ops/winograd_input.mlir . So it was relatively easy to get working on LLVMGPU backendiree_linalg_ext.winograd.filter_transformThis operation actually does not exist. The filter transform for winograd was implemented by constant folding the weights and constant filters. To support this the filters for the convolution needed to be converted from resources to inline constants and were evaluated (very slowly) at compile time.iree_linalg_ext.winograd.output_transformThis operation was working on SPIR-V backend and CPU backend, but not on the LLVMGPU backend. Again this wasnt tested end-to-end on all backends, but it was somewhat tested on CPU and SPIR-V backends (https://github.com/openxla/iree/blob/main/tests/e2e/linalg_ext_ops/winograd_output.mlir . So it was relatively easy to get working on LLVMGPU backend
Covered commits
- https://github.com/openxla/iree/pull/16854/commits/a38e893cb0f0c9b032b4619edda349eff2a0f152
- https://github.com/openxla/iree/pull/16854/commits/31effe976735ceaec4edf71d0a0d9b1f75a63648
- Unsubmitted change https://github.com/openxla/iree/pull/16862
Immediate next steps
- Make
iree_linalg_ext.attentionwork on all backends (at least CPU and LLVMGPU backend) and have them tested in CI. They should be relatively functional on different architectures, which will make them robust and easily portable.- More in-tree end-to-end tests are needed to ensure op support. Even the modest testing of
iree_linalg_ext.winograd.input_transformandiree_linalg_ext.winograd.output_transformon CPU and SPIR-V backend made it easy to port to LLVMGPU backend - The
TileAndDecomposeAttentionPassneeds to be fixed. This might require re-evaluating the pass implementation to use thePartialReductionTilingOpInterface
- More in-tree end-to-end tests are needed to ensure op support. Even the modest testing of
- Adding an
iree_linalg_ext.filter_transformoperation to LinalgExt dialect.- Also need to make sure that the const_eval framework in IREE can pick up and fold away these operations.
- Add more testing for the
iree_linalg_ext.winograd.input_transformandiree_linalg_ext.winograd.output_transformops by themselves as well as adding tests that convert a convolution into winograd and check that they work as a whole.
What's the latest status here? Do we want to use this as a tracking issue? A few of us are noticing and getting blocked by uneven support for these LinalgExt ops.
The winograd op support has been landed to a great extent. There are CPU and RoCM tests. Attention is in the process. Which ops are you having issues with?
Which ops are you having issues with?
Mainly attention, but I can't tell easily and that's the larger problem. There are several inactive issues like this one and https://github.com/iree-org/iree/issues/17467 saying things are incomplete and test coverage is mixed across backends.
- This CPU test for SDXL (default flags) has been XFAIL'd for a few weeks, I think due to attention: https://github.com/iree-org/iree/blob/97fbe5f36d7a82a85838b68622d64ab43790d749/build_tools/pkgci/external_test_suite/pytorch_models_cpu_llvm_task.json#L18-L19
- This GPU ROCm test for SDXL (default flags) never worked? https://github.com/iree-org/iree/blob/97fbe5f36d7a82a85838b68622d64ab43790d749/build_tools/pkgci/external_test_suite/pytorch_models_gpu_rocm_gfx90a.json#L19
- These GPU Vulkan test for SDXL (default flags) never worked? https://github.com/iree-org/iree/blob/97fbe5f36d7a82a85838b68622d64ab43790d749/build_tools/pkgci/external_test_suite/pytorch_models_gpu_vulkan.json#L15-L17
- None of the tests in
tests/e2e/linalg_ext_opsare running on ROCm/hip - Of the tests in
tests/e2e/linalg_ext_ops, some ops from https://iree.dev/reference/mlir-dialects/LinalgExt/ are not included and many are marked excluded on various backends (no XFAIL support there, so we won't even know if they start passing)