cva6
cva6 copied to clipboard
frontend: Clear 'speculative' flag on replay
Prevent deadlock when replaying speculative instruction from nonidempotent memory region.
Consider the following scenario while executing from nonidempotent memory:
- A branch instruction is fetched =>
speculative_d = 1'b1 - The instruction queue is full,
replayis asserted - Once the instruction queue is
readyagain, the frontend sends a request to the icache to refetch the branch instruction - The icache blocks as
dreq.spec == 1'b1(and we do not fetch speculatively from nonidempotent memory) - Deadlock
Hence, the speculative register needs to be rolled back if the corresponding speculative instruction is not accepted by the instruction queue. Since the speculative instruction should always be the most recently fetched instruction (the icache stalls until the speculative instruction is resolved), clearing speculative on a replay should be accurate.
(Alternatively we could set speculative only if the instruction queue is ready, but I think this might be less accurate as the underlying assumption "!instr_queue_ready => instr will not enter the instruction queue" might not hold.)
I just realise that this solution might not be correct, as re-fetching from nonidempotent memory might be a violation. I guess a clean solution would be buffering the replayed instruction or fetching only if the issue queue will definitely not assert a replay (e.g. there is sufficient space). However, this looks like a bigger change... What do you think?
I think the underlying problem is that we do not want to allow execution from non-idempotent memory, I think that doesn't make sense. The problem I see is that we treat that pretty much the same at the moment.
Or do you see a scenario where we want to execute from non-idempotent memory?
I encountered the issue when executing from Occamy's SRAM. I agree that in this specific case, we probably want to mark the SRAM as idempotent, but I think in general it might be required to support execution from non-idempotent memory regions (not very sure what the RISC-V point on this is). For instance, the bootrom and debug module are currently marked as non-idempotent and execute regions in the default Ariane config: https://github.com/openhwgroup/cva6/blob/97172398ad5d3be22c127e4fa45d7b63731bbab4/tb/ariane_soc_pkg.sv#L77-L82
I can't think of a good use-case of fetching from non-idempotent memory and would therefore simply not allow it. Can we make that work in that case i.e., adding the appropriate rules in occamy and ariane_soc_pkg? The SPM in Occamy should definitely be idempotent!
I'm not so sure about adding that restriction, Florian. With the way PMAs have to be specified it is easier to set all non-main-memory as NC+NI. Saying you can't execute from an NI region means you then need to do a lot more planning of the regions to figure out where you're going to put things that are only NC vs NC+NI
Yes, ideally that would be the same region. But unless we have a proper fix for this (one where we can also assess performance and overhead on the uarch) it seems this is the best way to go.
@niwis Forgive me if I'm missing some details, but is it possible to still allow to execute from NI regions, but do not allow speculation? I think this is the current solution for the data cache, to fetch NI data non-speculatively.
As far as I'm concerned yes, but the question is how to implement this correctly. As described above, the current implementation may deadlock (which this PR tries to fix). Also, the current implementation might replay instructions fetched from an NI region if the instruction queue is full, violating the non-idempotence (which this PR does not address).
I also can't think of a case where it is really necessary to execute from an NI, so I'm ok with forbidding it entirely instead of digging into this rabbit hole.
Between debug units and things like our hardware SD driver, I think executing from NI isn't that abnormal. I would consider a change to specifically disallow this to be a breaking change for downstream users. Documenting and saying that it's discouraged or may deadlock would be preferable if there isn't a fix forthcoming.
Hello @niwis, do you confirm this PR can be closed ? regards
@JeanRochCoulon that's up to you. From my point of view, this issue is still open as CVA6 may still dead-lock when executing from NI memory (which is currently permitted). But surely there are other approaches to fix this.
Ok, can I ask you to describe the problem in the Issue part, it would be a shame to lose it. Maybe someone will take the steps to solve it later.
Thanks for having logged the issue.