rocket-chip icon indicating copy to clipboard operation
rocket-chip copied to clipboard

MulDiv unit inconsistent behaviour, potential performance bug

Open sammy17 opened this issue 3 years ago • 0 comments

Type of issue: bug report

Impact: unknown

Development Phase: request

Other information

I noticed a weird behavior related to div/rem instructions in Rocket core while testing a new dynamic verification method. This may be a potential side-channel / performance bug in the MulDiv unit.

When I execute the following two instructions, divu instruction takes more clock cycles to commit than when I changed the order of the two instructions.

rem     s0, s4, a6
divu   s3, s11, s5

The order changed program:

divu    s3, s11, s5
rem     s0, s4, a6

The cycle count difference between these two programs is 64 cycles, so there is a considerable timing difference. Please kindly let me know if this is a result of a known design decision.

Otherwise, because there are no data hazards between the two instructions, I would expect the two program variants have the same cycle count.

If the current behavior is a bug, please provide the steps to reproduce the problem: tests.zip

# test_divu_1.elf contains the first variant while test_divu_2.elf contains the second variant.  

./emulator-freechips.rocketchip.system-freechips.rocketchip.system.DefaultConfig +verbose test_divu_1.elf 2>&1 | spike-dasm |& tee run.divu.1.log
./emulator-freechips.rocketchip.system-freechips.rocketchip.system.DefaultConfig +verbose test_divu_2.elf 2>&1 | spike-dasm |& tee run.divu.2.log

# Compare two log files and check the cycle counts at a specific PC after the code snippet E.g. - pc=[00000000800002dc]

What is the current behavior? Cycle count depends on the order of rem and div instructions even when there are no data hazards.

What is the expected behavior? Ideally, both variants of the program should take the same number of cycles.

Please tell us about your environment: - version: b503f8ac28a497b2463ffbac84bfe66533ace0bb - OS: Linux 3.10.0-1160.62.1.el7.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux

What is the use case for changing the behavior? To reduce clock cycle count in certain scenarios (and remove a potential side-channel).

sammy17 avatar Aug 09 '22 19:08 sammy17