Max191

Results 49 issues of Max191

There were a few ONNX test suite failures in https://github.com/iree-org/iree/pull/17770, but it was decided to accept the failures for now, and fix them retroactively. This issue is for tracking these...

integrations/pytorch
integrations/onnx

When there is a buffer used inside of an `scf.forall` op that is defined outside of the `scf.forall`, bufferization will unconditionally bufferize out of place by default in order to...

benchmarks:cuda
benchmarks:x86_64
benchmarks:comp-stats
benchmarks:android-cpu
benchmarks:android-gpu
benchmarks:vulkan-nvidia

This PR adds reshape into interface tensor folding patterns to TileAndDistributeToWorkgroups. If there are reshapes between interface tensors and their users, then TileAndDistributeToWorkgroups can fail, so these patterns help to...

benchmarks:cuda
benchmarks:x86_64
benchmarks:comp-stats
benchmarks:android-cpu
benchmarks:android-gpu
benchmarks:vulkan-nvidia

Vectorizing a `linalg.copy` op can result in a sequence of ``` %extract = tensor.extract_slice %source %read = vector.transfer_read %extract %write = vector.transfer_read %dest %insert = tensor.insert_slice %write into %dest ```...

benchmarks:cuda
benchmarks:x86_64
benchmarks:comp-stats
benchmarks:android-cpu
benchmarks:android-gpu
benchmarks:vulkan-nvidia

The LLVMGPU bufferization copy function inserts barriers for shared memory copies. Some of the copies copy to and from equivalent subviews, which gets folded away by canonicalizations. When this happens...

benchmarks:cuda
benchmarks:android-gpu
benchmarks:vulkan-nvidia

This adds an option to the DecomposeIm2colPass to unroll the resulting loop nest of the decomposition, and sets it to true by default. This is an easier form to handle...

This PR is the follow up to https://github.com/iree-org/iree/pull/18063, implementing the fusion pass to move set_encoding ops into producer dispatch regions when when the producer op is a LinalgOp. If there...

This PR adds a new pass that tries to reuse shared memory allocations in functions. This pass only does a very basic analysis, assuming no control flow operations (and is...

Running SDXL int8 with aggressive fusion enabled produces different results from running without aggressive fusion enabled. ### Repro Instructions ### 1. Checkout https://github.com/iree-org/iree/tree/shared/sdxl_quantized in IREE 2. Clone https://github.com/nod-ai/sdxl-scripts and `cd...

This adds support for multiple M, N, and K dims in problems when deducing a GPUMMASchedule. The new heuristic is similar to the old one, but works on pairs of...