Han-Chung Wang issues

Results 73 issues of


                                            Han-Chung Wang

[Codegen] Add support for vectorizing tensor.pack ops with masking.

It introduces `inferVectorSizesFromIR(Value val)` method, so every op can infer the input vector size through the chain. This is important for fusion cases (e.g., `generic + pack`) because it assumed...

benchmarks:x86_64

benchmarks:android-cpu

[CPU] vector RAW with masks is not folded away

I'm prototyping on pack op direct vectorization, and found that some transfer_write/read (with masks) are not folded away. IR: ```mlir func.func @main_dispatch_0_generic_32x128xD_f32xbf16_pack() { %cst = arith.constant 0.000000e+00 : bf16 %c32_i64...

codegen/llvm

The polynomial approximation for f16 math.powf generates NAN and INF

Coming from https://github.com/openxla/iree/issues/15661#issuecomment-1854762283, we observed that there is a bug in PolynomialApproximation pass. I landed a [workaround](https://github.com/openxla/iree/commit/a4a6b4bb74df601ccd558ccc658fa599eae559f3), which rewrite f16 approximations to occur with f32 intermediates. File a new issue...

codegen

codegen/llvm

Improve the vectorization for reverse-like tensor.extract op

We observed that the vectorization of reverse-like tensor.extract op was wrong in https://github.com/openxla/iree/issues/16544. Input: ```mlir func.func @foo_dispatch_0_generic_2x1x3_f32() { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index...

codegen

Break transpose to square tiles with transpose propagation

benchmarks:x86_64

[CPU] Add a pattern to break vector.transpose to square tiles.

benchmarks:x86_64

Dumping traces for iree-samples targets is not supported

Coming from https://github.com/google/iree/issues/8712 I found that the traces are not generated when running the test target. It's really inconvenient when debugging the issue. I have a commit which dumps inputs...

compiler/tools

infrastructure

[EPIC][GPU][DT] Bring up GPU data-tiling with reasonable performance

## Overview This is the umbrella issue that collects tasks toward phase 1. In the phase 1, we aim to provide a functional data-tiling GPU path with reasonable performance. In...

codegen

codegen/rocm

[GPU] Vector shape does not match the layout

@Max191 and I looked at enabling PadAndVectorDistribution pipeline and found that it failed in vector distribution in one of cases. To repro: `iree-opt --pass-pipeline='builtin.module(func.func(iree-llvmgpu-vector-distribute{test-layout}, canonicalize, cse))' ~/repro.mlir` ```mlir func.func @foo()...

codegen/rocm

[Flow] Make the output indexing_map of elementwise ops identity.

benchmarks:cuda

benchmarks:x86_64

benchmarks:comp-stats

benchmarks:android-cpu

benchmarks:android-gpu

benchmarks:vulkan-nvidia