x-heep icon indicating copy to clipboard operation
x-heep copied to clipboard

`cv32e40x` bug: coprocessor issue information (XIF) not sampled by the CPU

Open StMiky opened this issue 1 year ago • 0 comments

Summary

The cv32e40x CPU attempts to offload an instruction to a coprocessor connected through the CORE-V eXtension Interface (XIF issue_req) whenever its instruction decoder fails to recognize it (or when the instruction is a CSR instruction). Currently, instruction offloading does not check for backpressure from the CPU execution stage, possibly resulting in the feedback information from the coprocessor (e.g., the intention to write back the offloaded instruction result to one of the CPU scalar GPRs x0-x31) not being correctly propagated to the ID-EX pipeline register and, therefore, to the following stages.

Detailed example

The CPU execution stage applies backpressure to the instruction decoding stage (i.e., ex_ready=0) whenever execution takes more than one cycle, which is the case, for example, for memory accesses (load/store instructions). Take for instance the following assembly snippet:

...
li t0, 42
lw t1, 0(zero)
coproc.add t2, t1, t0
sw t2, 4(zero)
...

Where coproc.add is an (completely useless and redundant) instruction offloaded to a coprocessor that computes the sum t0+t1 and send the result back to the CPU, to be stored in t2. coproc.add gets offloaded to the coprocessor as soon as it is decoded, in which cycle the previous lw instruction is being executed. Usually, the coprocessors instruction decoding stage can answer an offloading request as soon as it issued by the CPU, and the response data is retired the cycle after. In our example, the CPU execution stage is not ready to accept a new instruction when the coproc.add instruction is offloaded, becasue it is still processing the previous lw. Therefore, the coprocessor response data does not get sampled in the ID-EX pipeline register until the next cycle, where it is no longer valid, as shown in the following timing diagram:

issue

Information about the offloaded instruction that comes from the CPU IF-ID pipeline register gets sampled correctly, so the instruction is actually propagated through the CPU stages. However, crucial information in the coprocessor issue response, like the intention to write back data in the CPU GPR, or the possibility to raise exceptions, is not. One of the consequences is that the result data from the coprocessor does not get written back to the CPU GPRs. More sever consequences may appear when the coprocessor uses the CPU load-store unit or the bus, possibly triggering error or exceptions that would be ignored by the CPU.

Proposed solution

A quick and dirty solution to get the information from the coprocessor correctly sampled into the ID-EX stage is to attempt instruction offloading through the XIF only when the execution stage is ready to accept a new instruction. This would result in the following fixed timing diagram:

solution

:warning: NOTE : this is not an optimal solution, as it effectively stalls the coproc for one cycle (possibly more if the previous instruction has a longer execution latency). A better way would be to offload an instruction to the coprocessor as soon as it is fetched and decoded (as it is now), and sample the coprocessor response data in a dedicated register (different from the ID-EX pipeline register) while waiting for the execution stage to be ready. However, cv32e40x maintainers are certainly better-suited for defining and implementing a proper fix, so I would wait for their intervention on the core.

A pull request with the proposed fix implemented is being prepared and tested in HEEPerator, and will be linked here soon.

StMiky avatar Aug 11 '23 12:08 StMiky