Bruce Lai
Bruce Lai
@Xhark Thanks for your response! @teijeong Is there any update for this issue?
Hi @fbarchard Can you help to review this? After applying this commit, the softmax operator can fully exploit RVV implementation.
Hi @alankelly Can you help to review this PR?
Hi @fbarchard @alankelly This PR is to support rvv x32-packw. If you have free time, please help to review.
> Could you use a strided load to read each vector with a single instruction? Using strided segment load can have better performance. > packw-x2v means a 4x2v gemm kernel...
Hi @fbarchard I got the idea. > But I think instead of calling a common function, it will need a custom x8-packw that does 4 bytes at a time in...
@alankelly I've rebased it.
Hi @hcindyl @ScottTodd I've tried this and now it can run correctly. It takes around 260 seconds on my machine.  However, it looks like`--timeout 900` doesn't work now. ...
Conv test is slow. https://github.com/openxla/iree/blob/26d041fda7c103e7512453d3177e2ffc439e66c8/tests/e2e/regression/lowering_config.mlir#L34-L54 I can reduce the test case and provide a PR.