Han-Chung Wang
Han-Chung Wang
This is the IR from MobileNetV3. These two (element-wise) generic ops are able to be fused, but they are not. We'd like to fuse them into a single generic op,...
We've been hitting issues about vectorizing table lookups. I had an offline discussion with @MaheshRavishankar . The main issue is that we don't handle `tensor.extract` op in Linalg vectorization. There...
### What happened? I found that there are redundant buffer allocation in LinalgExt ops. It also happens in normal Linalg ops. The main issue is that a constant op is...
The pass only handles Conv2DNhwcHwcfOp case. We should generalize it to handle nchw cases. File an issue for tracking it.
Follow up from https://github.com/google/iree/issues/8411, the quantized convolution ops are not vectorized. This introduces temp buffer allocation because types mismatch. We landed https://github.com/google/iree/pull/8526 to work it around. Ideally, we'd like to...
The benchmarks are tracked under experimental-flags.
IREE is 2x slower than baseline on c2-standard-16 for single-threaded (526 ms v.s. 278 ms). Some quantized GEMMs are very slow in this case. ## dispatch_8_matmul_384x128x512 This is a fill...