Incorrect work of examine() in case of halt groups
I launch spike simulator with 4 harts:
spike \
--rbb-port=7777 \
--isa=rv64gc \
--dm-progsize=6 \
--dm-sba=0 \
--dm-abstract-rti=0 \
-p4 \
-m0x80000000:0xa00000 \
<some_elf_file>
then the following script:
for LOG in {1..2}; \
do openocd \
-c "adapter driver remote_bitbang" \
-c "remote_bitbang host 127.0.0.1" \
-c "remote_bitbang port 7777" \
-c 'gdb_port disabled' \
-f spike_smp4_rtoshw4.cfg \
-c "log_output ${LOG}.log" \
-c init \
-c targets \
-c shutdown -d3; \
done;
spike_smp4_rtoshw4.cfg
proc init_targets {} {
set _CHIPNAME riscv
jtag newtap $_CHIPNAME cpu -irlen 5 -expected-id 0x10e31913
set _TARGETNAME_0 $_CHIPNAME.cpu0
set _TARGETNAME_1 $_CHIPNAME.cpu1
set _TARGETNAME_2 $_CHIPNAME.cpu2
set _TARGETNAME_3 $_CHIPNAME.cpu3
target create $_TARGETNAME_0 riscv -chain-position $_CHIPNAME.cpu -coreid 0 -rtos hwthread
target create $_TARGETNAME_1 riscv -chain-position $_CHIPNAME.cpu -coreid 1 -rtos hwthread
target create $_TARGETNAME_2 riscv -chain-position $_CHIPNAME.cpu -coreid 2 -rtos hwthread
target create $_TARGETNAME_3 riscv -chain-position $_CHIPNAME.cpu -coreid 3 -rtos hwthread
target smp $_TARGETNAME_0 $_TARGETNAME_1 $_TARGETNAME_2 $_TARGETNAME_3
}
Result of the first iteration:
TargetName Type Endian TapName State
-- ------------------ ---------- ------ ------------------ ------------
0 riscv.cpu0 riscv little riscv.cpu running
1 riscv.cpu1 riscv little riscv.cpu running
2 riscv.cpu2 riscv little riscv.cpu running
3* riscv.cpu3 riscv little riscv.cpu running
Result of the second iteration:
TargetName Type Endian TapName State
-- ------------------ ---------- ------ ------------------ ------------
0 riscv.cpu0 riscv little riscv.cpu running
1 riscv.cpu1 riscv little riscv.cpu unknown
2 riscv.cpu2 riscv little riscv.cpu unknown
3* riscv.cpu3 riscv little riscv.cpu unknown
Also, if you add additional printing in file riscv-013.c in function riscv013_get_hart_state(), you can see that dmstatus values are different between two runs.
From 8ca3f30a292977f7b94ba057a86fab7459b8f2c8 Mon Sep 17 00:00:00 2001
From: Kirill Radkin <[email protected]>
Date: Mon, 7 Aug 2023 12:57:39 +0300
Subject: [PATCH] addtional printing
---
src/target/riscv/riscv-013.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/target/riscv/riscv-013.c b/src/target/riscv/riscv-013.c
index 24713a93c..70f905c35 100644
--- a/src/target/riscv/riscv-013.c
+++ b/src/target/riscv/riscv-013.c
@@ -2496,6 +2496,7 @@ static int riscv013_get_hart_state(struct target *target, enum riscv_hart_state
return ERROR_FAIL;
uint32_t dmstatus;
+ LOG_INFO("DMSTATUS == %" PRIx32, dmstatus);
if (dmstatus_read(target, &dmstatus, true) != ERROR_OK)
return ERROR_FAIL;
if (get_field(dmstatus, DM_DMSTATUS_ANYHAVERESET)) {
--
2.34.1
If I understand it correctly, this situation is connected to how examine() is implemented in riscv-013.c. At the end of examine() OpenOCD adds all targets to a halt group (because we have smp configuration in this case). It leads to an interesting situation: harts are halted and resumed one by one (as it should be) when examined during the first connection, but when examined during the second connection harts are already merged into a halt group. Because of that when OpenOCD tries to halt the first target, all harts become in a halted state.
@timsifive, can you confirm that my reasoning is correct?
@kr-sc I haven't yet tried to reproduce it locally. What values of dmstatus are you getting, please? Are they unexpected?
Also your added LOG_INFO print in the patch is put before the dmstatus_read() call - can you check if that is maybe a typo?
My expectation is that the examination should pass even if the harts are members a halt/resume group. For the examination, it is important that the given hart being examined can be halted for a brief period of time (and then resumed, if running previously).
If in that process some other hart(s) get halted/resumed also (due to the halt/resume groups), that should not break the examination as such.
I haven't yet tried to reproduce it locally. What values of dmstatus are you getting, please? Are they unexpected?
I get values that differs only in all/any resumeack bits, maybe it's not important here.
Also your added LOG_INFO print in the patch is put before the dmstatus_read() call - can you check if that is maybe a typo?
Oh, it should be after dmstatus_read(), of course.
My expectation is that the examination should pass even if the harts are members a halt/resume group. For the examination, it is important that the given hart being examined can be halted for a brief period of time (and then resumed, if running previously).
Main problem that when we try to halt (if it's was not halted, for example) only first target during second examine() call, we halt all targets which was added in halt group during previous examiine() call and we lost information about hart's state before second examine.
Can you add -d to your OpenOCD invocations, capture the output, and attach it here?
After second examine targets (1-3) in halted state (not unknown, maybe somebody fix it in new versions), but before first connection all of them were in running state.
P.S. unknown hart state was fixed here 198edca6d0516c979c787b30a99cc93d2342221a
The difference seems to be simply that in the first log the harts are running while in the second log they are halted.
@aap-sc do you have time to look at this? It looks related to (or maybe even fix by) #900.
Wait, is @kr-sc's latest comment saying this was fixed by #900? If so, can we just close this?
@timsifive not quite. What @kr-sc says is that unknown status was fixed. However, the underlying issue is still there.
It's just that instead of unknown status, harts will be halted, instead of unknown. I'll try to produce an updated version of logs. My understanding is that this is related to how halt-groups are handled by OpenOCD during examine: when harts are combined in a halt group - this is an invasive change which is not reverted upon disconnect/re-examine). It's been a while since I took a glance at this one, so my memory may fail me.
Actually, @kr-sc was going to look at this one eventually, however since halt groups are not supported by Syntacore HW this task is not of the highest priority. This issue was filed just to document a problem.
That being said - I'll double check the situation.
The difference seems to be simply that in the first log the harts are running while in the second log they are halted.
Yes, but in both runs we connect to same simulator session, which targets were in running state before first connection. After second, they become in halted state. I think, that it is incorrect behaviour of OpenOCD, becauese we didn't send any commands during these two connections, but we changed hart state.
To summarize:
- When first connecting, all harts are running. After examine() all harts are still running. examine() did put all harts in the same halt group.
- When connecting a second time, all harts are also running. However, during examine(), halting the first hart halts all the other harts as well (because they're in the same halt group). After examine(), hart 0 is running (because it was running when examine() first looked at it), but the remaining harts are halted (because they were halted when examine() first looked at them).
I agree that's a bug. Not sure when I'll get to it. One solution might be to clear all halt groups before examining the first hart. (But examine() is called per "target" without any extra context, so it's not as straightforward as it might be.)
One solution might be to clear all halt groups before examining the first hart.
Halt groups are used to implement OpenOCD smp target, so I think we should clear halt groups before OpenOCD disconnected from target.
so I think we should clear halt groups before OpenOCD disconnected from target.
I think we should reset halt groups even upon initial connection. Though it makes examine even more invasive :(, however the resulting behavior is more predictable.
Looking at [3.14.17. Debug Module Control and Status 2 (dmcs2, at 0x32)] it seems like halt groups should be reset when DM is reset, since the reset value for dtmcs2.group is preset, and dtmcs2.group represents the halt group (dmcs2.grouptype resets to 0 (halt))of the hart corresponding to dmcontrol.hartsel = 0 (also the reset value).
OpenOCD resets the DM upon connection, so IMHO the halt groups should be reset. Therefore this is a Spike issue.
See https://github.com/riscv/riscv-debug-spec/issues/1146 for the discussion whether my interpretation is correct.