Kunwar Grover

Results 41 issues of Kunwar Grover

This patch adds support for lowering attention through VectorDistribution pipeline. Currently, it has the following limitations which will be fixed as followups: - Only 1 subgroup is used - There...

Depends on: https://github.com/iree-org/iree/pull/17536

This pass is similar to loop-invariant-code-motion but loops for loop invariant subsets instead. This pass is required for post-vectorization cleanups.

Depends on https://github.com/iree-org/iree/pull/17626

This patch adds support for distributing attention to multiple subgroups. Some points to note: - Due to some issues with layout analysis, we cannot yet do multiple n subgroups. This...

Clipping or clamping is defined as: ``` clip(x, min_value, max_value) = min(max(x, min_value), max_value) ``` Some backends can generate better instructions if it's known we are clamping a value. For...

enhancement ➕
codegen
onboarding/codegen

### Request description # Motivation A pattern we notice in flash attention kernels is: ``` A: tensor B: tensor C: tensor D : tensor = matmul(A, B, C) E :...

enhancement ➕
codegen
onboarding/codegen

This PR teaches attention decomposition to set attributes for attention matmuls by passing attribute dictionaries to iree_linalg_ext.online_attention operation. This allows us to further control codegen of matmuls (generally the root...

Since https://github.com/iree-org/iree/pull/18748 tensor.pad can be fused in with tiling. This patch combines the parallel and reduction padding passes into a single pass that pads at once, and the pads are...