Max191 issues

Results 49 issues of


                                            Max191

ONNX Test failures after LLVM integrate

There were a few ONNX test suite failures in https://github.com/iree-org/iree/pull/17770, but it was decided to accept the failures for now, and fix them retroactively. This issue is for tracking these...

integrations/pytorch

integrations/onnx

[Codegen] Do not consider parallel regions in bufferization analysis

When there is a buffer used inside of an `scf.forall` op that is defined outside of the `scf.forall`, bufferization will unconditionally bufferize out of place by default in order to...

benchmarks:cuda

benchmarks:x86_64

benchmarks:comp-stats

benchmarks:android-cpu

benchmarks:android-gpu

benchmarks:vulkan-nvidia

[Codegen] Add interface tensor reshape foldings to TileAndDistribute

This PR adds reshape into interface tensor folding patterns to TileAndDistributeToWorkgroups. If there are reshapes between interface tensors and their users, then TileAndDistributeToWorkgroups can fail, so these patterns help to...

benchmarks:cuda

benchmarks:x86_64

benchmarks:comp-stats

benchmarks:android-cpu

benchmarks:android-gpu

benchmarks:vulkan-nvidia

[Codegen] Add vector transfer + slice foldings in GenericVectorization

Vectorizing a `linalg.copy` op can result in a sequence of ``` %extract = tensor.extract_slice %source %read = vector.transfer_read %extract %write = vector.transfer_read %dest %insert = tensor.insert_slice %write into %dest ```...

benchmarks:cuda

benchmarks:x86_64

benchmarks:comp-stats

benchmarks:android-cpu

benchmarks:android-gpu

benchmarks:vulkan-nvidia

[LLVMGPU] Don't create copies when subviews are equivalent

The LLVMGPU bufferization copy function inserts barriers for shared memory copies. Some of the copies copy to and from equivalent subviews, which gets folded away by canonicalizations. When this happens...

benchmarks:cuda

benchmarks:android-gpu

benchmarks:vulkan-nvidia

[Im2col] Add option to unroll decomposed im2col loops

This adds an option to the DecomposeIm2colPass to unroll the resulting loop nest of the decomposition, and sets it to true by default. This is an easier form to handle...

[Flow] Add pass to fuse encoding ops into dispatch regions after hoisting

This PR is the follow up to https://github.com/iree-org/iree/pull/18063, implementing the fusion pass to move set_encoding ops into producer dispatch regions when when the producer op is a LinalgOp. If there...

[Codegen][GPU] Add pass to reuse shared memory buffers in simple cases

This PR adds a new pass that tries to reuse shared memory allocations in functions. This pass only does a very basic analysis, assuming no control flow operations (and is...

[Flow][SDXL] Numerics different with vs. without aggressive fusion on SDXL

Running SDXL int8 with aggressive fusion enabled produces different results from running without aggressive fusion enabled. ### Repro Instructions ### 1. Checkout https://github.com/iree-org/iree/tree/shared/sdxl_quantized in IREE 2. Clone https://github.com/nod-ai/sdxl-scripts and `cd...

[GPU] Support multiple contraction dims in MmaSchedules

This adds support for multiple M, N, and K dims in problems when deducing a GPUMMASchedule. The new heuristic is similar to the old one, but works on pairs of...