coreblocks icon indicating copy to clipboard operation
coreblocks copied to clipboard

Internal Interrupt Controller

Open piotro888 opened this issue 10 months ago • 7 comments

Closes #599

Implements priv spec core internal interrupt controller operating on CSRs. Details in docstrings.

Depends on #652 , needs rebase to reflect changes and clean-up from it after it is approved.

piotro888 avatar Apr 11 '24 15:04 piotro888

The cause of the problem is Forwarder inside FetchResume Unifier key. It can illegally store resume call to next fetcher stall.

I'm still not sure how it happened that call with resume_from_exception=0 was stored in forwarder instead of resume_from_exception=1 that should be always be later (called from Retirement), and what is the best way to fix it.

EDIT: Unifer key collects only entries form FUs (with resume from exception=0) and Retirement calls fetch directly, so stored value makes sense, but still Retirement call coming first to block the forwader is weird

piotro888 avatar Apr 15 '24 17:04 piotro888

I am not sure if I see where exactly the problem is. I looked at the execution of TestCoreInterrupt_0_interrupt_asm and it seems that the core works fine, at least from the perspective of the instruction stream it is given. I didn't dig into what it exactly executes and the test itself

Can you provide cycle at which there is a bug? And what is the expected behavior there?

xThaid avatar Apr 15 '24 17:04 xThaid

@xThaid Ok, I found it. The problem:

  1. Core is stalled due to CSR instruction
  2. Misprediction happens
  3. CSRUnit calls FetchResume with resume_from_exception=0 that normally should be ignored, but Fetch is still doing something and resume method is not yet ready, so this call waits inside Unifier in Forwarder.
  4. Core completes flushing, and Retirement tries to call resume with resume_from_excpetion=1, but the method blocks because fetcher is still not ready
  5. Fetcher becomes ready, due to random priority resume call from Retirement wins with Forwarder.
  6. Fetch is unstalled, and resume method is not ready again
  7. Another unsafe instruction arrives
  8. Forwader in Unifier still holds the old call value!!! When resume becomes ready, fetch is resumed to the old PC.

Setting priority between Retirement.CORE_RESUME and Unifier, to Unifier.Forwarder should fix the issue, so all calls with resume_form_excpetion=0 arrive before call from the Retirement (with resume_form_excpetion=1). I need to find a way how to declare it cleanly

piotro888 avatar Apr 15 '24 17:04 piotro888

Time ~707200ns

piotro888 avatar Apr 15 '24 17:04 piotro888

I think that best solution would be to split resume into resume_from_exception and resume_from_unsafe with conflict priority explicitly defined in Fetch

piotro888 avatar Apr 15 '24 17:04 piotro888

OK, thanks, now I can see it.

I wonder though, why do we even allow for speculatively resuming the fetch unit? If we are precommiting a CSR instruction without side effects, (in my opinion) it must not call fetch resume - this is clearly a side effect.

This fixes the failing tests:

diff --git a/coreblocks/func_blocks/csr/csr.py b/coreblocks/func_blocks/csr/csr.py
index 3e203caf..5191ea37 100644
--- a/coreblocks/func_blocks/csr/csr.py
+++ b/coreblocks/func_blocks/csr/csr.py
@@ -103,6 +103,7 @@ class CSRUnit(FuncBlock, Elaboratable):
         done = Signal()
         accepted = Signal()
         exception = Signal()
+        can_resume = Signal()
         precommitting = Signal()
 
         current_result = Signal(self.gen_params.isa.xlen)
@@ -205,7 +206,7 @@ class CSRUnit(FuncBlock, Elaboratable):
 
         @def_method(m, self.get_result, done)
         def _():
-            m.d.comb += accepted.eq(1)
+            m.d.comb += accepted.eq(can_resume)
             m.d.sync += reserved.eq(0)
             m.d.sync += instr.valid.eq(0)
             m.d.sync += done.eq(0)
@@ -249,6 +250,7 @@ class CSRUnit(FuncBlock, Elaboratable):
         @def_method(m, self.precommit)
         def _(rob_id: Value, side_fx: Value):
             with m.If(instr.rob_id == rob_id):
+                m.d.sync += can_resume.eq(side_fx)
                 m.d.comb += precommitting.eq(1)
                 m.d.comb += exe_side_fx.eq(side_fx)
 

Current setup seems to be broken anyway. csr unit sets the ready signal for method fetch_resume whenever get_result is called. However, if the unifier forwarder is not ready (another func unit inserted something there), fetch_resume won't run and its ready signal will be gone.

edit: it seems that the priv unit also tries to resume the fetch unit even when the instruction was precommited without side effects.

xThaid avatar Apr 15 '24 18:04 xThaid

In both cases besides side_fx, it also needs to check if exception is not going to be reported on that instruction, otherwise the same conflict could happen if resume is blocked.

Current setup seems to be broken anyway. csr unit sets the ready signal for method

It would work, because there is at most one unsafe instruction at time in the core (that call resume via Forwarder)

But it would still break when we would introduce interrupt delay enforcing via insertion of extra instruction at end of the pipeline. Unsafe instruction on precommit stage with exception clear, could still conflict with that interrupt. So I think that solution with separate resume methods would fit better in the terms of cleaner logic later.

piotro888 avatar Apr 17 '24 15:04 piotro888

Should be finally ready to merge ~~EDIT: wfi test fail, I will fix it tomorrow, rest should be ready to review~~

piotro888 avatar May 21 '24 18:05 piotro888