iree
iree copied to clipboard
A retargetable MLIR-based machine learning compiler and runtime toolkit.
### What happened? for the given IR ```mlir module { func.func @"torch-jit-export"( %arg6: !torch.vtensor) -> (!torch.vtensor) attributes {torch.onnx_meta.ir_version = 6 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch",...
### Request description # Motivation A pattern we notice in flash attention kernels is: ``` A: tensor B: tensor C: tensor D : tensor = matmul(A, B, C) E :...
This reduces the default AArch64 matmul tile sizes from (8, 16, 1) to (6, 16, 1). Originally, (8, 16, 1) was chosen to attempt to use all available vector registers...
### Request description The MAD are similar to FMA instructions and perform multiplication and addition within the same instruction. gfx942 supports a packed version: `V_PK_MAD_I16` and `V_PK_MAD_U16` that should allow...
To ensure truncate ops get fused with their producers, don't fuse them with their consumer.
### What happened? I got a segfault in `mlir::iree_compiler::IREE::VM::translateModuleToBytecode`. Here is a [vm-translate-to-bytecode-crash.zip](https://github.com/user-attachments/files/16678033/vm-translate-to-bytecode-crash.zip). ### Steps to reproduce your issue Use `compile.sh` in the ZIP. ### What component(s) does this issue...
Changes needed to integrate https://github.com/llvm/llvm-project/pull/91475-
This PR adds a new pass that tries to reuse shared memory allocations in functions. This pass only does a very basic analysis, assuming no control flow operations (and is...
We rewrote the `rocm` runtime hal impl side to `hip`. The corresponding compiler target backend is still called `rocm`. To be consistent and avoid confusion, let's rename the compiler target...