ibex
ibex copied to clipboard
Investigate special case direct forwarding from loads
When using the writeback stage any instruction directly following a load that uses the results of that load must stall at least one cycle (or more if the load takes more than one cycle to get its data) so the required data can be written to the register file. We do not directly forward the load data into the dependent instruction for timing reasons (read data coming in would have to be fed directly to the ALU).
There are particular cases of direct forwarding of load data into dependent instructions that may be possible, where the instruction is doing something simple with the data.
A specific example would be branches using an equal/not-equal to zero condition or potentially the more general equal/not-equal to a register condition. Implementing this would require an extra comparator tied directly to the incoming memory data.
This can be useful in tight pointer chasing loops e.g:
while(list_node && (list_node->n != search_val))
list_node = list_node->next;
One of the things coremark does is such a pointer chasing loop.
We should investigate how practical this is and what other simple forward cases may exist.
One issue with my suggested direct forwarding is the instruction request out will be dependent upon the memory data in (so we'd have an data_rdata_i -> instr_req_o feedthrough).
We could break this path by only apply this optimisation to correctly predicted branches or not-taken branches (so the direct forwarding would confirm branch prediction was correct or that we have no need to branch so won't need to do anything with a new instruction request, everything else would just fall back to the old stall for load to write data back to RF option). Of course this limits the performance upside and complicates the design.