Han-Chung Wang issues

Results 73 issues of


                                            Han-Chung Wang

Update tiling sizes for ARM convolution configurations.

buildkite:benchmark

Generic ops are not fused and get pulled into the same dispatch

This is the IR from MobileNetV3. These two (element-wise) generic ops are able to be fused, but they are not. We'd like to fuse them into a single generic op,...

codegen

Missing vectorization for gather ops

We've been hitting issues about vectorizing table lookups. I had an offline discussion with @MaheshRavishankar . The main issue is that we don't handle `tensor.extract` op in Linalg vectorization. There...

help wanted

codegen

codegen/llvm

Redundant buffer allocations in LinalgExt/Linalg ops when there are constants in outs

### What happened? I found that there are redundant buffer allocation in LinalgExt ops. It also happens in normal Linalg ops. The main issue is that a constant op is...

bug 🐞

codegen

Generalize ConvertConv2D1x1ToMatmulPass to account nchw cases

The pass only handles Conv2DNhwcHwcfOp case. We should generalize it to handle nchw cases. File an issue for tracking it.

codegen

Missing support for vectorizing quantized convolution ops

Follow up from https://github.com/google/iree/issues/8411, the quantized convolution ops are not vectorized. This introduces temp buffer allocation because types mismatch. We landed https://github.com/google/iree/pull/8526 to work it around. Ideally, we'd like to...

codegen

codegen/llvm

Adds passes for (pad + cumsumer) to CPU pipeline and enables benchmarks.

The benchmarks are tracked under experimental-flags.

buildkite:benchmark

buildkite:benchmark-x86_64

buildkite:benchmark-riscv

Han-Chung Wang

Update tiling sizes for ARM convolution configurations.

Generic ops are not fused and get pulled into the same dispatch

Missing vectorization for gather ops

Redundant buffer allocations in LinalgExt/Linalg ops when there are constants in outs

Generalize ConvertConv2D1x1ToMatmulPass to account nchw cases

Missing support for vectorizing quantized convolution ops

Adds passes for (pad + cumsumer) to CPU pipeline and enables benchmarks.

[WIP] Benchmark pad + conv on CPU

Slow quantized matmul in MobileBert on c2-standard-16

[Codegen] Add support for vectorizing tensor.unpack ops with masking.