rvv-bench icon indicating copy to clipboard operation
rvv-bench copied to clipboard

Missing valid instructions

Open dzaima opened this issue 1 year ago • 1 comments

  • Widening reductions (vfwredosum.vs, vfwredusum.vs, vwredsum.vs, vwredsumu.vs) should allow LMUL=8
  • vrgatherei16.vv should only disallow LMUL=8 for e8

dzaima avatar Nov 14 '23 20:11 dzaima

The first point should be fixed now, thanks. The current design just masks standalone LMUL and SEW, so I can't add LMUL=8 SEW!=8, vrgatherei16.vv for now. I'll have to look into restructuring the code, or allowing a special cases.

camel-cdr avatar Nov 15 '23 15:11 camel-cdr

The latest commit 7b3f7b6 fixes this, I'll update the results page soon.

I've now updated the instruction cycle count measurement code to remove the destination vector dependency that processors without vl prediction suffer from. This fixes the weird 4, 4, 5, 8 LMUL scaling on the C908, with proper measurements it's now 1, 2, 4, 8. Implementing that required a big rewrite and yet another preprocessor, but the code now allows for fine-grained SEW and LMUL masking.

camel-cdr avatar May 20 '24 00:05 camel-cdr

A couple months ago I was working on a very-generated risc-v/rvv instruction benchmarking thing (as in, JS generates 570MB of assembly which then becomes a 129MB binary) and it's largely complete (having things like separate throughput & latency tests, register cycling galore for removing dependencies (also tests intentionally leaving them in), precise argument initialization, testing different constants where applicable; rather inspired from uops.info) but I just kinda stopped working on it and haven't published it (doesn't help that I have no risc-v hardware).

here's a screenshot of qemu (VLEN=256) timings in it :)

Probably should put in the last bits of effort on that, but the leftovers are rather annoying (making the UI less clunky; displaying multiple archs in the same table (questions on vl/vlmax matching); probably 0.7.1 support otherwise the previous point is somewhat pointless; have latency tests between different-width/type operands which is a massive mess largely due to it being technically impossible to do properly)

dzaima avatar May 20 '24 02:05 dzaima

The table was also inspired by uops.info, but mine is way less sophisticated then that or yours.

I was looking into adding RISC-V support to llvm-exegesis, which does something very similar, but I need to have bare-metal support to test on RTL simulations.

probably 0.7.1 support otherwise the previous point is somewhat pointless

I'm planning to drop 0.7.1 support soon, since I've not got two rvv 1.0 boards, don't have ssh access to a C920 anymore, and there will be some more releases this year.

camel-cdr avatar May 20 '24 11:05 camel-cdr