iree
iree copied to clipboard
[codegen] [gpu]: SD3 MMDiT attention dispatch fails on LinalgExtToLoops for amdgpu targets
What happened?
Error log:
(turb.env) PS C:\Users\eagarvey\SHARK\SHARK-Turbine> iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-hip-target=gfx1103 --iree-vm-bytecode-module-output-format=flatbuffer-binary .\sd3_mmdit_gfx1103_dps\dispatch_27_attn.mlir
.\sd3_mmdit_gfx1103_dps\dispatch_27_attn.mlir:2:3: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {iree.gpu.target = #iree_gpu.target<arch = "gfx1103", features = "", wgp = <compute = fp64|fp32|fp16|int64|int32|int16|int8, storage = b64|b32|b16|b8, subgroup = shuffle|arithmetic, dot = dp4xi8toi32, mma = [<WMMA_F32_16x16x16_F16>, <WMMA_F16_16x16x16_F16>, <WMMA_I32_16x16x16_I8>], subgroup_size_choices = [32, 64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647]>>, ukernels = "none"}>
hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {iree.gpu.target = #iree_gpu.target<arch = "gfx1103", features = "", wgp = <compute = fp64|fp32|fp16|int64|int32|int16|int8, storage = b64|b32|b16|b8, subgroup = shuffle|arithmetic, dot = dp4xi8toi32, mma = [<WMMA_F32_16x16x16_F16>, <WMMA_F16_16x16x16_F16>, <WMMA_I32_16x16x16_I8>], subgroup_size_choices = [32, 64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647]>>, ukernels = "none"}>) {
^
failed to translate executables
Original MLIR: Azure Dispatch IR: Azure IR dump: Azure
Steps to reproduce your issue
- Download IR above
- Run:
iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-hip-target=gfx1103 --iree-vm-bytecode-module-output-format=flatbuffer-binary dispatch_27_attn.mlir -o dispatch_27_attn.vmfb
What component(s) does this issue relate to?
Compiler
Version information
IREE compiler version 20240927.1029 @ 76c3e61d563dbc22b74aa9d3d79c11c24a799697
Additional context
Shapes for this model in the past have required a masked attention implementation using transform dialect scripts as they do not match intrinsics. I'm not sure if this has any bearing on the error given.
The compile flags used are stripped down from what is normally used to compile these models, but since the error reproduces without experimental flags, I thought it best to leave them out.