koika
koika copied to clipboard
Avoidable stalling in the RISC-V example
valid_rs1
and valid_rs2
are never read in the RISC-V example. Making sure that the values really correspond to rs1
or rs2
instead of stalling whenever at least one of them is associated with something other than 0 in the scoreboard yields better performance:
-- Running tests with Cuttlesim --
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_qsort.rv32 -1
[before] real: 0m0.018s user: 0m0.017s sys: 0m0.001s
[after ] real: 0m0.013s user: 0m0.013s sys: 0m0.000s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/rvbench_median.rv32 -1
[before] real: 0m0.012s user: 0m0.012s sys: 0m0.000s
[after ] real: 0m0.009s user: 0m0.009s sys: 0m0.000s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/img.rv32 -1
[before] real: 0m0.213s user: 0m0.197s sys: 0m0.016s
[after ] real: 0m0.157s user: 0m0.137s sys: 0m0.019s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/primes.rv32 -1
[before] real: 0m4.413s user: 0m4.407s sys: 0m0.003s
[after ] real: 0m3.084s user: 0m3.080s sys: 0m0.001s
_objects/rv32i.v/rvcore.cuttlesim.opt tests/_build/rv32i/integ/morse.rv32 -1
[before] real: 0m1.583s user: 0m1.581s sys: 0m0.000s
[after ] real: 0m1.142s user: 0m1.141s sys: 0m0.000s
-- Running tests with Verilator --
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_qsort.vmh -1
[before] real: 0m0.033s user: 0m0.033s sys: 0m0.000s
[after ] real: 0m0.031s user: 0m0.030s sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/morse.vmh -1
[before] real: 0m2.692s user: 0m2.689s sys: 0m0.001s
[after ] real: 0m2.547s user: 0m2.544s sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/rvbench_median.vmh -1
[before] real: 0m0.022s user: 0m0.022s sys: 0m0.000s
[after ] real: 0m0.021s user: 0m0.021s sys: 0m0.000s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/primes.vmh -1
[before] real: 0m8.383s user: 0m8.371s sys: 0m0.002s
[after ] real: 0m7.892s user: 0m7.885s sys: 0m0.001s
_objects/rv32i.v/obj_dir.opt/Vtop +VMH=tests/_build/rv32i/integ/img.vmh -1
[before] real: 0m0.401s user: 0m0.388s sys: 0m0.013s
[after ] real: 0m0.378s user: 0m0.361s sys: 0m0.017s
I kept only the tests that run for more than 2ms to keep this short. This results in tests taking roughly 25% less time with Cuttlesim and 5% less time for Verilator. Of course, your mileage may vary. I did not check the effects on synthesis.
See related commit 66b59e9c.
Good stuff! :) Looping in @threonorm , who wrote the code; I wonder if the same pattern existed in the Bluespec version of the code.
This results in tests taking roughly 25% less time with Cuttlesim and 5% less time for Verilator.
I like this change ^^
Yes, we currently stall conservatively for source register (but do not add dependencies for destination registers which are not real destinations). Mainly because the design we started from was doing that (it is coming from a class).
I don't have a strong opinion, we can merge that change as it is not important to keep the comparison with the baseline anymore.
Here are the things I think we should look at: 1 - variations on the number of cycles (instead of simulation time, that can vary quite a bit and is not really architecturally meaningful) 2 - variation on the critical path both on FPGA and the open pdk synthesis 3 - variation on the type checking + compilation time (4 - incompatibilities/porting work/variations on the symbolic evaluation time/use in already written proofs)
Maybe we can do a little experimentation for 1) and 2), I think the critical path is currently in decode on the equivalent pattern for the destination register and the scoreboard. Or at least that's what I remember, so it may move to go to the source instead, not sure.
For 3), hopefully it should not influence too much, but we have seen weirdness before, so it could be good to confirm it.
I think for 4), this could impact the proof we did for the enclave system, but we probably don't want to keep the code in sync anyway (and it is probably out of sync already), so I don't mind ignoring that dimension.
I can look at 1, 2, 3 but sadly not before august.