iree [codegen] [gpu]: SD3 MMDiT attention dispatch fails on LinalgExtToLoops for amdgpu targets

[codegen] [gpu]: SD3 MMDiT attention dispatch fails on LinalgExtToLoops for amdgpu targets

Open monorimet opened this issue 4 months ago • 1 comments

What happened?

Error log:

(turb.env) PS C:\Users\eagarvey\SHARK\SHARK-Turbine> iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-hip-target=gfx1103 --iree-vm-bytecode-module-output-format=flatbuffer-binary .\sd3_mmdit_gfx1103_dps\dispatch_27_attn.mlir
.\sd3_mmdit_gfx1103_dps\dispatch_27_attn.mlir:2:3: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"rocm", "rocm-hsaco-fb", {iree.gpu.target = #iree_gpu.target<arch = "gfx1103", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<WMMA_F32_16x16x16_F16>, <WMMA_F16_16x16x16_F16>, <WMMA_I32_16x16x16_I8>], subgroup_size_choices = [32, 64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647]>>, ukernels = "none"}>
  hal.executable.variant public @rocm_hsaco_fb target(<"rocm", "rocm-hsaco-fb", {iree.gpu.target = #iree_gpu.target<arch = "gfx1103", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<WMMA_F32_16x16x16_F16>, <WMMA_F16_16x16x16_F16>, <WMMA_I32_16x16x16_I8>], subgroup_size_choices = [32, 64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647]>>, ukernels = "none"}>) {
  ^
failed to translate executables

Original MLIR: Azure Dispatch IR: Azure IR dump: Azure

Steps to reproduce your issue

Download IR above
Run:

iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-hip-target=gfx1103 --iree-vm-bytecode-module-output-format=flatbuffer-binary dispatch_27_attn.mlir -o dispatch_27_attn.vmfb

What component(s) does this issue relate to?

Compiler

Version information

IREE compiler version 20240927.1029 @ 76c3e61d563dbc22b74aa9d3d79c11c24a799697

Additional context

Shapes for this model in the past have required a masked attention implementation using transform dialect scripts as they do not match intrinsics. I'm not sure if this has any bearing on the error given.

The compile flags used are stripped down from what is normally used to compile these models, but since the error reproduces without experimental flags, I thought it best to leave them out.

Sep 28 '24 03:09 monorimet

iree iree copied to clipboard

[codegen] [gpu]: SD3 MMDiT attention dispatch fails on LinalgExtToLoops for amdgpu targets

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

iree
iree copied to clipboard