riscv-debug-spec icon indicating copy to clipboard operation
riscv-debug-spec copied to clipboard

How to reset the DM on the first connection on both 0.13 and 1.0 spec versions

Open en-sc opened this issue 10 months ago • 6 comments

  • Assume an external debugger (ED) is connecting to a hart for the first time.
  • ED wants to reset the DM in order to get it into a predictable state.
  • To reset the DM the ED needs to set dmcontrol.dmactive = 0.
  • However, the spec states [3.7. Abstract Commands]:

While an abstract command is executing ({abstractcs-busy} in {dm-abstractcs} is high), a debugger must not change {hartsel}...

The ED does not know whether an abstract command is executing, so it needs to read dmcontrol to know the value of dmcontrol.hartsel before writing dmcontrol = dmactive=0 | <the old hartsel value> (requesting DM reset without changing hartsel).

  • Assume the read of dmcontrol results in an error.
  • Since the ED does not know anything about the state of the DM yet, there are two possibilities:
    1. dmcontrol.dmactive is low and [3.14.2. Debug Module Control ]

    Any accesses to the module may fail.

    1. dmcontrol.dmactive is high and it's a legitimate DMI error (e.g. some issue with the connection).
  • There are two possible routes:
    1. ED assumes dmcontrol.dmactive is low and writes dmcontrol = <dmcontrol.dmactive=1> | <hartsel=0> -- may result in the change of hartel, so ED can't do that.
    2. ED reads dmcontrol again.

Now let's take into account RISC-V Debug Spec v0.13, since the ED does not yet know which version it is (dtmcs only distinguishes between 0.11 and (0.13 or 1.0)). In versions less then 0.13.3, according to [1.2.1.2. Incompatible Changes from 0.13 to 1.0]:

  1. Bump version to 3. #512 , Require debugger to poll dmactive after lowering it. #566

The HW is not required to eventually return something meaningful when dmcontroll.dmactive was set to low. So the ED may wait forever in the loop, reading dmcontrol and getting an error on RISC-V Debug Spec v0.13 - compliant HW.

Can this somehow be fixed?

en-sc avatar Apr 26 '24 11:04 en-sc

My proposal would be to specify the external debugger is permitted to perform dmactive <-- 0 at any time, without the need to retain any other dmcontrol bits, and regardless of the DM state -- so that the external debugger can bring the DM safely and reliably to a known state (but what happens to in-flight operations like abstract commands would be undefined).

I am inclined to believe that this was the intention of the debug spec, however strict interpretation of the spec text While an abstract command is executing (...), a debugger must not change {hartsel} (as @en-sc pointed out above) seem to contradict that.

JanMatCodasip avatar May 03 '24 14:05 JanMatCodasip

My proposal would be to specify the external debugger is permitted to perform dmactive <-- 0 at any time, without the need to retain any other dmcontrol bits, and regardless of the DM state...

The thing is, there may be a RISC-V Debug Spec v0.13 implemetation that complies to this strict interpretation of While an abstract command is executing (...), a debugger must not change {hartsel}. Since the ED does not know whether the target is 0.13 or 1.0 compliant at this poin, it should go with the strictest spec of the two and still read dmcontrol.hartsel in order not to change it while writing dmactive to zero.

I would suggest specifying that while dmactive is low, read of dmcontrol should not fail (may result in a busy response). However, It maybe to strict of a requirement for the HW.

en-sc avatar May 03 '24 15:05 en-sc

I maintain the opinion that

  • if a DM that is reset (via dmcontrol.dmactive <-- 0), and
  • at the same time change of hartsel bits by the external debugger disrupts the operation of the DM,

then the reset implementation in the DM can be considered defective, because in that case dmactive=0 does not clear the state of the debug module, as required by any version of the debug spec.

Since the ED does not know whether the target is 0.13 or 1.0 compliant at this poin, it should go with the strictest spec of the two (...)

My recommendation for the EDs would be to go with the most robust approach in this case - that is to perform the DM reset always. If the ED detects the case when there would be concern about the hartsel bits, the ED can warn the user and possibly provide a configuration option to the user to alter the behavior.

JanMatCodasip avatar May 03 '24 16:05 JanMatCodasip

I'we created a flowchart that would hopefully help to better describe the issue. dm-reset drawio

en-sc avatar May 14 '24 16:05 en-sc

  1. Asserting dmactive should always be allowed, and reset hartsel. dmactive is a reset signal.
  2. Even in 0.13, dmactive is listed as R/W, so writing 0 should result in the bit becoming 0 eventually. I don't see an issue with polling it and the bit never becoming 0. But maybe I misunderstood your point.
  3. If there is no DM at the bus address, then the value will always read 0 and the debugger will perceive it as dmactive never goes high. That's not ideal, but I think it's good enough.

rtwfroody avatar May 16 '24 16:05 rtwfroody

  1. Asserting dmactive should always be allowed, and reset hartsel. dmactive is a reset signal.

I would to clarify this. Please, take a look at #1040.

  1. Even in 0.13, dmactive is listed as R/W, so writing 0 should result in the bit becoming 0 eventually. I don't see an issue with polling it and the bit never becoming 0. But maybe I misunderstood your point.

Oh, I see. I didn't consider this. Thanks!

  1. If there is no DM at the bus address, then the value will always read 0 and the debugger will perceive it as dmactive never goes high. That's not ideal, but I think it's good enough.

Initially, this issue was the least concerning to me. I think it can be mitigated by a descriptive error message on the ED side.

en-sc avatar May 17 '24 09:05 en-sc

Asserting dmactive should always be allowed, and reset hartsel. dmactive is a reset signal.

That would have been a better way to write the spec, but not what the spec says. Fixing this is not backwards compatible, and it's not worth it to make a backwards incompatible change to address the extreme corner case where the DM is executing an abstract command and also reading dmcontrol results in an error.

rtwfroody avatar Jul 19 '24 16:07 rtwfroody