iree issues

[codegen] [gpu]: SD3 MMDiT attention dispatch fails on LinalgExtToLoops for amdgpu targets

1

### What happened? Error log: ``` (turb.env) PS C:\Users\eagarvey\SHARK\SHARK-Turbine> iree-compile --iree-input-type=torch --iree-vm-bytecode-module-output-format=flatbuffer-binary --iree-hal-target-backends=rocm --mlir-print-debuginfo=false --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=rocm --iree-hip-target=gfx1103 --iree-vm-bytecode-module-output-format=flatbuffer-binary .\sd3_mmdit_gfx1103_dps\dispatch_27_attn.mlir .\sd3_mmdit_gfx1103_dps\dispatch_27_attn.mlir:2:3: error: failed to run translation of source executable to target...

monorimet

bug 🐞

help wanted

[Codegen] Add pass to convert splat constants to fills

3

Splat constants are the non-canonical form for codegen, we instead prefer fills for 2 reasons: 1. Consistency with dynamic shapes 2. Fills are tilable and compose well with tile +...

qedawkins

benchmarks:comp-stats

DNS - check bazel deps

hanhanW

[DT] Plans for the buffer allocation in data-tiling

9

## How I think about buffer allocation in data-tiling 1. The default path is now materializing encodings at very early stage (i.e., GlobalOpt), while we want to build the late...

hanhanW

codegen

[ROCM] Evaluate whether we can attach `amdgpu-no-implicitarg-ptr` to our generated functions.

If any implicit argument is used LLVM will reserve 256 bytes of kernarg space and emit metadata requiring the runtime to populate all implicit arguments. The only way to control...

benvanik

performance ⚡

codegen/rocm

Robustify upstream LICM for zero-trip count loops and other loop kinds

1

Currently the [upstream pass for loop invariant code motion](https://github.com/llvm/llvm-project/blob/7f1b465c6ae476e59dc90652d58fc648932d23b1/mlir/lib/Transforms/LoopInvariantCodeMotion.cpp#L47) performs hoisting on all loops independent of loop type or loop bounds. This has two issues: 1. This allows hoisting out...

qedawkins

enhancement ➕

good first issue 🌱

[GlobalOptimization] 1x1 filter convolutions not converted to matmul

3

The `Convert1x1FilterConvToMatmul` pass currently fails when there is a non-unit batch N dimension. In such cases, the transformation is still possible, and the N dimension should be folded into the...

Max191

[Codegen] Remove wrong usages of OptimizeVectorTransfer

Groverkss

Numeric issues on AMDGPU with f32 elementwise mul + f16 trunc

18

For this elementwise + pad dispatch ``` func.func @main(%8 : tensor, %9 : tensor) -> tensor { %c0_f16 = arith.constant 0.0 : f16 %13 = tensor.empty() : tensor %14 =...

nirvedhmeshram

Better error message when device not found

8

### Request description In this issue, https://github.com/nod-ai/SHARK-Platform/issues/264 I encountered an error message that looked like ``` ValueError: :0: NOT_FOUND; HAL device `__device_0` not found or unavailable: #hal.device.target; ``` It would...

renxida

enhancement ➕

iree
iree copied to clipboard

Metadata

[codegen] [gpu]: SD3 MMDiT attention dispatch fails on LinalgExtToLoops for amdgpu targets

[Codegen] Add pass to convert splat constants to fills

DNS - check bazel deps

[DT] Plans for the buffer allocation in data-tiling

[ROCM] Evaluate whether we can attach `amdgpu-no-implicitarg-ptr` to our generated functions.

Robustify upstream LICM for zero-trip count loops and other loop kinds

[GlobalOptimization] 1x1 filter convolutions not converted to matmul

[Codegen] Remove wrong usages of OptimizeVectorTransfer

Numeric issues on AMDGPU with f32 elementwise mul + f16 trunc

Better error message when device not found

← Metadata

Owner

Metadata

iree iree copied to clipboard

Metadata

← Metadata

Owner

Metadata

iree
iree copied to clipboard