Diego Caballero
Diego Caballero
Wow... the vectorizer must be doing really bad when the i64 ops are exposed: Benchmark Name | Average Latency (ms) | Median Latency (ms) | Latency Standard Deviation (ms) --...
Repro: `llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < repro.ll` ``` ; Supported: ; vwmul.vv define @single_vwmul_i8_i16( %va, %vb) { %vc = sext %va to %vd = sext %vb to %ve = mul...
Assigning this to @qcolombet for now so that he can provide some feedback when he has the time.
Awesome! Both implementations lead to 18% and 12% improvement on MobileBERT-quant and PersonDetect, respectively. I think #2 is a good starting point! A few comments: - There is another `hasOneUse`...
> I have to look closer, IIRC it's possible to do that, but that kind of transformations doesn't fit nicely in SDISel framework. I think it's worth having this discussion...
Thanks @qcolombet for pushing this forward! I think we should find a path to the longer term approach. I was looking into the RVV spec and, in addition to adds,...
I can reproduce performance after integration. Great!
Hi @banach-space! Welcome to IREE and sorry for our delayed response! We are not currently working on this issue so happy to help if you plan to spend some cycles...
Let's keep this issue open to address the missing unrolling so that we don't have to open a new one just for that.
That sounds good, actually. We don't have to fix everything. How difficult would be to go with vector unrolling approach? Did you report the issue with alloca and bufferization to...