riscv-openocd icon indicating copy to clipboard operation
riscv-openocd copied to clipboard

Incorrect work of examine() in case of halt groups

Open kr-sc opened this issue 2 years ago • 14 comments

I launch spike simulator with 4 harts:

spike \
  --rbb-port=7777 \
  --isa=rv64gc \
  --dm-progsize=6 \
  --dm-sba=0 \
  --dm-abstract-rti=0 \
  -p4 \
  -m0x80000000:0xa00000 \
  <some_elf_file>

then the following script:

for LOG in {1..2}; \
  do openocd \
    -c "adapter driver remote_bitbang" \
    -c "remote_bitbang host 127.0.0.1" \
    -c "remote_bitbang port 7777" \
    -c 'gdb_port disabled' \
    -f spike_smp4_rtoshw4.cfg \
    -c "log_output ${LOG}.log" \
    -c init \
    -c targets \
    -c shutdown -d3; \
done;
spike_smp4_rtoshw4.cfg
proc init_targets {} {
	set _CHIPNAME riscv
	jtag newtap $_CHIPNAME cpu -irlen 5 -expected-id 0x10e31913

	set _TARGETNAME_0 $_CHIPNAME.cpu0
	set _TARGETNAME_1 $_CHIPNAME.cpu1
	set _TARGETNAME_2 $_CHIPNAME.cpu2
	set _TARGETNAME_3 $_CHIPNAME.cpu3

	target create $_TARGETNAME_0 riscv -chain-position $_CHIPNAME.cpu -coreid 0 -rtos hwthread
	target create $_TARGETNAME_1 riscv -chain-position $_CHIPNAME.cpu -coreid 1 -rtos hwthread
	target create $_TARGETNAME_2 riscv -chain-position $_CHIPNAME.cpu -coreid 2 -rtos hwthread
	target create $_TARGETNAME_3 riscv -chain-position $_CHIPNAME.cpu -coreid 3 -rtos hwthread

	target smp $_TARGETNAME_0 $_TARGETNAME_1 $_TARGETNAME_2 $_TARGETNAME_3
}

Result of the first iteration:

TargetName         Type       Endian TapName            State
--  ------------------ ---------- ------ ------------------ ------------
 0  riscv.cpu0         riscv      little riscv.cpu          running
 1  riscv.cpu1         riscv      little riscv.cpu          running
 2  riscv.cpu2         riscv      little riscv.cpu          running
 3* riscv.cpu3         riscv      little riscv.cpu          running

Result of the second iteration:

TargetName         Type       Endian TapName            State
--  ------------------ ---------- ------ ------------------ ------------
 0  riscv.cpu0         riscv      little riscv.cpu          running
 1  riscv.cpu1         riscv      little riscv.cpu          unknown
 2  riscv.cpu2         riscv      little riscv.cpu          unknown
 3* riscv.cpu3         riscv      little riscv.cpu          unknown

Also, if you add additional printing in file riscv-013.c in function riscv013_get_hart_state(), you can see that dmstatus values are different between two runs.

From 8ca3f30a292977f7b94ba057a86fab7459b8f2c8 Mon Sep 17 00:00:00 2001
From: Kirill Radkin <[email protected]>
Date: Mon, 7 Aug 2023 12:57:39 +0300
Subject: [PATCH] addtional printing

---
 src/target/riscv/riscv-013.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/target/riscv/riscv-013.c b/src/target/riscv/riscv-013.c
index 24713a93c..70f905c35 100644
--- a/src/target/riscv/riscv-013.c
+++ b/src/target/riscv/riscv-013.c
@@ -2496,6 +2496,7 @@ static int riscv013_get_hart_state(struct target *target, enum riscv_hart_state
 		return ERROR_FAIL;
 
 	uint32_t dmstatus;
+	LOG_INFO("DMSTATUS == %" PRIx32, dmstatus);
 	if (dmstatus_read(target, &dmstatus, true) != ERROR_OK)
 		return ERROR_FAIL;
 	if (get_field(dmstatus, DM_DMSTATUS_ANYHAVERESET)) {
-- 
2.34.1

If I understand it correctly, this situation is connected to how examine() is implemented in riscv-013.c. At the end of examine() OpenOCD adds all targets to a halt group (because we have smp configuration in this case). It leads to an interesting situation: harts are halted and resumed one by one (as it should be) when examined during the first connection, but when examined during the second connection harts are already merged into a halt group. Because of that when OpenOCD tries to halt the first target, all harts become in a halted state.

kr-sc avatar Aug 07 '23 09:08 kr-sc

@timsifive, can you confirm that my reasoning is correct?

kr-sc avatar Aug 25 '23 11:08 kr-sc

@kr-sc I haven't yet tried to reproduce it locally. What values of dmstatus are you getting, please? Are they unexpected?

Also your added LOG_INFO print in the patch is put before the dmstatus_read() call - can you check if that is maybe a typo?

My expectation is that the examination should pass even if the harts are members a halt/resume group. For the examination, it is important that the given hart being examined can be halted for a brief period of time (and then resumed, if running previously).

If in that process some other hart(s) get halted/resumed also (due to the halt/resume groups), that should not break the examination as such.

JanMatCodasip avatar Aug 29 '23 08:08 JanMatCodasip

I haven't yet tried to reproduce it locally. What values of dmstatus are you getting, please? Are they unexpected?

I get values that differs only in all/any resumeack bits, maybe it's not important here.

Also your added LOG_INFO print in the patch is put before the dmstatus_read() call - can you check if that is maybe a typo?

Oh, it should be after dmstatus_read(), of course.

My expectation is that the examination should pass even if the harts are members a halt/resume group. For the examination, it is important that the given hart being examined can be halted for a brief period of time (and then resumed, if running previously).

Main problem that when we try to halt (if it's was not halted, for example) only first target during second examine() call, we halt all targets which was added in halt group during previous examiine() call and we lost information about hart's state before second examine.

kr-sc avatar Aug 30 '23 08:08 kr-sc

Can you add -d to your OpenOCD invocations, capture the output, and attach it here?

timsifive avatar Aug 31 '23 00:08 timsifive

1.log 2.log

After second examine targets (1-3) in halted state (not unknown, maybe somebody fix it in new versions), but before first connection all of them were in running state.

kr-sc avatar Sep 05 '23 12:09 kr-sc

P.S. unknown hart state was fixed here 198edca6d0516c979c787b30a99cc93d2342221a

kr-sc avatar Sep 05 '23 12:09 kr-sc

The difference seems to be simply that in the first log the harts are running while in the second log they are halted.

@aap-sc do you have time to look at this? It looks related to (or maybe even fix by) #900.

timsifive avatar Sep 08 '23 16:09 timsifive

Wait, is @kr-sc's latest comment saying this was fixed by #900? If so, can we just close this?

timsifive avatar Sep 08 '23 16:09 timsifive

@timsifive not quite. What @kr-sc says is that unknown status was fixed. However, the underlying issue is still there.

It's just that instead of unknown status, harts will be halted, instead of unknown. I'll try to produce an updated version of logs. My understanding is that this is related to how halt-groups are handled by OpenOCD during examine: when harts are combined in a halt group - this is an invasive change which is not reverted upon disconnect/re-examine). It's been a while since I took a glance at this one, so my memory may fail me.

Actually, @kr-sc was going to look at this one eventually, however since halt groups are not supported by Syntacore HW this task is not of the highest priority. This issue was filed just to document a problem.

That being said - I'll double check the situation.

aap-sc avatar Sep 08 '23 16:09 aap-sc

The difference seems to be simply that in the first log the harts are running while in the second log they are halted.

Yes, but in both runs we connect to same simulator session, which targets were in running state before first connection. After second, they become in halted state. I think, that it is incorrect behaviour of OpenOCD, becauese we didn't send any commands during these two connections, but we changed hart state.

kr-sc avatar Sep 18 '23 10:09 kr-sc

To summarize:

  1. When first connecting, all harts are running. After examine() all harts are still running. examine() did put all harts in the same halt group.
  2. When connecting a second time, all harts are also running. However, during examine(), halting the first hart halts all the other harts as well (because they're in the same halt group). After examine(), hart 0 is running (because it was running when examine() first looked at it), but the remaining harts are halted (because they were halted when examine() first looked at them).

I agree that's a bug. Not sure when I'll get to it. One solution might be to clear all halt groups before examining the first hart. (But examine() is called per "target" without any extra context, so it's not as straightforward as it might be.)

timsifive avatar Sep 25 '23 20:09 timsifive

One solution might be to clear all halt groups before examining the first hart.

Halt groups are used to implement OpenOCD smp target, so I think we should clear halt groups before OpenOCD disconnected from target.

kr-sc avatar Sep 26 '23 09:09 kr-sc

so I think we should clear halt groups before OpenOCD disconnected from target.

I think we should reset halt groups even upon initial connection. Though it makes examine even more invasive :(, however the resulting behavior is more predictable.

aap-sc avatar Sep 26 '23 11:09 aap-sc

Looking at [3.14.17. Debug Module Control and Status 2 (dmcs2, at 0x32)] it seems like halt groups should be reset when DM is reset, since the reset value for dtmcs2.group is preset, and dtmcs2.group represents the halt group (dmcs2.grouptype resets to 0 (halt))of the hart corresponding to dmcontrol.hartsel = 0 (also the reset value).

OpenOCD resets the DM upon connection, so IMHO the halt groups should be reset. Therefore this is a Spike issue.

See https://github.com/riscv/riscv-debug-spec/issues/1146 for the discussion whether my interpretation is correct.

en-sc avatar Sep 22 '25 12:09 en-sc