iree icon indicating copy to clipboard operation
iree copied to clipboard

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Results 759 iree issues
Sort by recently updated
recently updated
newest added

Take this file as input: ``` func.func @conv2d_accumulate_2_32_32_32_times_3_3_64_dtype_i1_i1_i1(%lhs: tensor, %rhs: tensor, %acc: tensor) -> tensor { %result = linalg.conv_2d_nchw_fchw {dilations = dense : tensor, strides = dense : tensor} ins(%lhs,...

This flag https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/compiler/src/iree/compiler/Codegen/LLVMGPU/KernelConfig.cpp#L90 enables software prefetching for kernels using shared memory. Software prefetching is disabled by default, and only enabled by this flag. Over time, prefetching became part of GPU...

codegen
quality of life 😊
codegen/rocm

This flag https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/compiler/plugins/target/ROCM/ROCMTarget.cpp#L93 sets a waves-per-eu attribute for llvm compilation on **every dispatch** to give the register allocator a hint. https://github.com/iree-org/iree/pull/17365 introduced a way to specify these llvm func attributes...

codegen
codegen/llvm
quality of life 😊

This flag https://github.com/iree-org/iree/blob/d834aa7357179e0d806f3634d2efe3af2fa45171/compiler/src/iree/compiler/Codegen/Common/PolynomialApproximationPass.cpp#L17 disables polynomial approximation for most math dialect operations, for hardware that supports these math operations directly. It looks like some backends rely on this flag for performance...

codegen
quality of life 😊

Some multi-reduction dispatches take a long time to compile. For context, https://github.com/iree-org/iree/issues/18479 identifies numerical issues with the current pipeline and https://github.com/iree-org/iree/pull/18519 should solve this issue. But the compilation time for...

codegen

Via a runtime system to allow for multiple instances of the same program to share constants. The complication with implicit sharing is that we only want two of the same...

runtime
performance ⚡
hal/api

### Request description We want to introduce an http cache server to the kubernetes cluster as it will help with the build times for several jobs: linux_x64_clang in [ci_linux_x64_clang.yml](https://github.com/iree-org/iree/blob/main/.github/workflows/ci_linux_x64_clang.yml) linux_x64_clang_asan...

enhancement ➕
infrastructure

In case of broadcast + matmul kernels the outermost dimension (batch dim) is tiled to 1. We want to fold these into tensor.expand_shape after the distribution.

Failure is seen in following tests LLVMCPU/test/pipeline_pad_tests.mlir An example of IR from `pipeline_tile_and_fuse.mlir` is here https://gist.github.com/nirvedhmeshram/3349f2739dfb529fa4800040bf1c8490 It needs to be verified that the IR generated is what we want and...

Pack ops can affect tiling decisions and hence it is beneficial to generalize them, for e.g for below IR ``` %5 = linalg.generic {indexing_maps = [affine_map (d0, d1)>], iterator_types =...