riscv-torture
riscv-torture copied to clipboard
Bad coverage of compressed ISA
I'm using riscv-torture to test my RV32IC implementation. For this I create RV32I test cases and build them with -march=RV32IC
. See https://github.com/cliffordwolf/picorv32/tree/master/scripts/torture for my test setup (config settings in riscv-torture-rv32.diff
).
For the most part the generated code does not map to compressed instructions. The only exception is an occasional slli
instruction. (The .word
at the end of a pseg often maps to some random compressed insns, but they are never executed, so that does not count.) Most psegs look more or less like this:
000001cc <pseg_70>:
1cc: 00100393 li t2,1
1d0: 4067bc13 sltiu s8,a5,1030
1d4: 03fe slli t2,t2,0x1f
1d6: 6c300613 li a2,1731
1da: 7c360593 addi a1,a2,1987
1de: 00b66463 bltu a2,a1,1e6 <pseg_70+0x1a>
1e2: 5f60206f j 27d8 <crash_forward>
1e6: 00100d13 li s10,1
1ea: fff00493 li s1,-1
1ee: 00420fb3 add t6,tp,tp
1f2: 3e8c99e3 bne s9,s0,de4 <pseg_71>
1f6: f7d47537 lui a0,0xf7d47
For testing purposes I've removed the .word
sections at the end of the psegs so I am left with something that only contains "real" instructions. The example I am looking at right now contains 2485 instructions. Only 82 of those instructions (about 3%) are compressed, and almost all of them are slli
instructions:
2 add
3 ebreak
4 j
1 or
69 slli
3 xor
The ebreak
insns are part of my RVTEST_FAIL / RVTEST_PASS macros. Two of the four j
insns are part of the "static" frame that is included in every test case. So not counting the slli
insns, this whole test case effectively contains 8 compressed instructions (about 0.3%).
Would it be possible to add a "compressed" option to riscv-torture that makes riscv-torture select register / immediate combinations that map to compressed insns with a much higher probability? Maybe randomly switching between the "compressed" probabilities and the current behavior on a per-pseg basis?
The coverage should be improved. Looking at the code, the main reasons RVC instructions aren't being selected is that the register specifiers and immediates aren't in range.
Since only 1/4 of the registers are available to many RVC instructions, instructions that take two register specifiers already only have a 1/16 chance of being selected. For others, RVC instructions are only available if the source and destination match, a 1/32 chance.
Likewise, uniform-random immediates are almost always out of range.
The solution is to bias the register and immediate selector. I actually don't think this should be a new option to torture, just a baked-in feature.
I agree with Andrew that it's be nice if we can accomplish this by improving the randomizer (it already understands biasing).
But We'll have to see if it can get us all the way there. For example, I'm really surprised you're not seeing compressed branches. I'm pretty sure we have no control over the immediate offset of them though. :(
We're also going to want to make sure the stats torture spits out include the compressed data, which I'm pretty sure it doesn't. There's a lot of unique cases in RVC and we'll want to make sure we hit them at least sometimes.
Clifford,
I suspect that we probably won't be able to find the cycles to address this issue in the near future. However, you may be able to find some low-hanging fruit by exploring generator/src/main/scala/Rand.scala
, and you can try adding more hand-crafted sequences to places like SeqALU.scala (e.g., adding more sequences that share common register operands).
That should at least be able to greatly improve the compressed ALU-op coverage. Let us know how it goes.
Jfyi: In addition to riscv-torture I'm now using csmith to generate test cases that I compile using riscv gcc. It's still under construction, but if anyone is interested, here are the scripts I'm using for the csmith-based tests:
https://github.com/cliffordwolf/picorv32/tree/master/scripts/csmith
That looks very interesting; keep us in the loop on how it pans out.
jfyi: https://github.com/csmith-project/csmith/issues/34
I now have a working setup with csmith. Jfyi, here is what I'm doing:
First I use csmith to create a test case (my CPU is RV32, so I create a platform.info file before running csmith that reflects that fact):
echo "integer size = 4" > platform.info
echo "pointer size = 4" >> platform.info
csmith --no-packed-struct -o test.c
Then I compile the test case with my host gcc (with -m32
) and run it. Usually this prints something like checksum = CFC47D24
almost immediately. But sometimes it just hangs. I'm assuming this might be a gcc bug exposed by csmith maybe. Anyway, if the binary does not finish in less than 2 seconds CPU time then I simply discard the test case and restart the process with a new one.
Next I build a RISCV32 ELF file (using newlib, my own syscalls.c and a simple "boot loader" at address 0 that does things like setting the stack pointer and then jump to the newlib entry point at 0x10000). I then run this binary in a patched version of spike. If it takes more than 1000000 instructions to execute the test case then I discard it and restart with another test case. If running it in spike yields a different checksum than the version built with my host gcc, then I also discard it.
Finally I run the same ELF file in a verilator model of my PicoRV32 core and compare the generated checksums. I terminate with an error if the checksum is different from what I got with spike and with the binary built with the host gcc.
On average it takes less than two seconds to generate and process one test case on my machine with this procedure.
Here is a typical test case as generated by csmith: http://scratch.clifford.at/test_2146419290.c
I discard maybe 20% of all test cases generated by csmith. Almost all of them because the binary built by the host gcc already hangs. Occasionally I discard one because it takes more than 1000000 instructions to complete them in spike. If there is a bug in e.g. riscv-gcc I would simply discard the test case using this methodology. But it should be fairly easy to build something similar that is using csmith and spike to look for bugs in riscv-gcc.
I ran a few 1000 cycles of this procedure now (see make loop
in my scripts/csmith/). So far I have not found a bug in my processor. (I have tested the error path by introducing a bug into my processor, just to be sure it's actually working.)
A while back you wrote
[..] we probably won't be able to find the cycles to address this issue in the near future. [..]
For me this is now a solved issue because with csmith I get a good coverage of insn patterns that gcc will actually produce. In addition to what I do with riscv-torture this gives me high confidence that my core is working as expected.
So please feel free to close this issue if you feel like it.
Thanks for the writeup on csmith. The more disparate tools to test our processors the better.
I'm leaving this issue open because it's still an issue that should be addressed, even if it languishes for a good while.