Diego Caballero
Diego Caballero
Thanks, Hanhan! Could you please elaborate on why we have to limit the fusion transformation to x86 CPUs and we can't apply it generally to all the backends?
Naively disabling unrolling for generic ops with transposes leads to 34% and 30% improvement on MobileBERT-quant and fp on RISC-V, respectively. However, to do something meaningful here we would need...
https://github.com/iree-org/iree/pull/10287#issuecomment-1241256362 shows the magnitude of the problem. When we lower a `tosa.rescale` operation to Arith before the vectorizer and expose its mixed-length types to it (`i8`, `i32` and `i64`), we...
I still see ordered horizontal reductions: ``` 7.49 │ add s4, s2, s3 16.00 │ vle32.v v11, (s4) 8.08 │ addi s3, s3, 64 61.02 │ vfadd.vv v10, v10, v11...
There seems to be an issue with the `reassociate-fp-reducton` flag and threading: Note `-mlir-disable-threading` flag and `reassoc` in the output: ``` iree-compile -iree-input-type=tosa -output-format=vm-bytecode -iree-hal-target-backends=llvm-cpu -iree-llvm-target-triple=riscv64 -iree-llvm-target-cpu=generic-rv64 -iree-llvm-target-abi=lp64d -iree-llvm-target-cpu-features="+m,+a,+f,+d,+v" -riscv-v-vector-bits-min=512...
Please, @vmurali coordinate with @rsuderman
Thanks for the pointer! Ok, let me keep this issue open to make sure the specific dispatches that I see on RISC-V are addressed. Assigning this to @pzread as he...
> We need a pattern to rewrite quantized convolution to generic + convolution, like what we've done for quantized matmul. I think @bjacob and @rsuderman can also help here. As...
Adding @vmurali to this issue so that he can coordinate with Rob in case something is needed at codegen level or Rob needs some help.
Hey Rob, could you please comment on the current state of this? I think you mentioned you were not seeing the expected performance in some of the cases. What about...