x-heep
x-heep copied to clipboard
`cv32e40x` bug: coprocessor issue information (XIF) not sampled by the CPU
Summary
The cv32e40x
CPU attempts to offload an instruction to a coprocessor connected through the CORE-V eXtension Interface (XIF issue_req
) whenever its instruction decoder fails to recognize it (or when the instruction is a CSR instruction). Currently, instruction offloading does not check for backpressure from the CPU execution stage, possibly resulting in the feedback information from the coprocessor (e.g., the intention to write back the offloaded instruction result to one of the CPU scalar GPRs x0
-x31
) not being correctly propagated to the ID-EX pipeline register and, therefore, to the following stages.
Detailed example
The CPU execution stage applies backpressure to the instruction decoding stage (i.e., ex_ready=0
) whenever execution takes more than one cycle, which is the case, for example, for memory accesses (load
/store
instructions). Take for instance the following assembly snippet:
...
li t0, 42
lw t1, 0(zero)
coproc.add t2, t1, t0
sw t2, 4(zero)
...
Where coproc.add
is an (completely useless and redundant) instruction offloaded to a coprocessor that computes the sum t0+t1
and send the result back to the CPU, to be stored in t2
. coproc.add
gets offloaded to the coprocessor as soon as it is decoded, in which cycle the previous lw
instruction is being executed. Usually, the coprocessors instruction decoding stage can answer an offloading request as soon as it issued by the CPU, and the response data is retired the cycle after. In our example, the CPU execution stage is not ready to accept a new instruction when the coproc.add
instruction is offloaded, becasue it is still processing the previous lw
. Therefore, the coprocessor response data does not get sampled in the ID-EX
pipeline register until the next cycle, where it is no longer valid, as shown in the following timing diagram:
Information about the offloaded instruction that comes from the CPU IF-ID pipeline register gets sampled correctly, so the instruction is actually propagated through the CPU stages. However, crucial information in the coprocessor issue response, like the intention to write back data in the CPU GPR, or the possibility to raise exceptions, is not. One of the consequences is that the result data from the coprocessor does not get written back to the CPU GPRs. More sever consequences may appear when the coprocessor uses the CPU load-store unit or the bus, possibly triggering error or exceptions that would be ignored by the CPU.
Proposed solution
A quick and dirty solution to get the information from the coprocessor correctly sampled into the ID-EX stage is to attempt instruction offloading through the XIF only when the execution stage is ready to accept a new instruction. This would result in the following fixed timing diagram:
:warning: NOTE : this is not an optimal solution, as it effectively stalls the coproc for one cycle (possibly more if the previous instruction has a longer execution latency). A better way would be to offload an instruction to the coprocessor as soon as it is fetched and decoded (as it is now), and sample the coprocessor response data in a dedicated register (different from the ID-EX pipeline register) while waiting for the execution stage to be ready. However,
cv32e40x
maintainers are certainly better-suited for defining and implementing a proper fix, so I would wait for their intervention on the core.
A pull request with the proposed fix implemented is being prepared and tested in HEEPerator, and will be linked here soon.