nuttx
nuttx copied to clipboard
SMP: fix crash when switch to new task which is still running
Summary
SMP: fix crash when switch to new task which is still running
Situation:
Assume we have 2 cpus, and busy run task0.
CPU0 CPU1 task0 -> task1 task2 -> task0
- remove task0 form runninglist
- take task1 as new tcb
- add task0 to blocklist
- clear spinlock 4.1 remove task2 form runninglist 4.2 take task0 as new tcb 4.3 add task2 to blocklist 4.4 use svc ISR swith to task0 4.5 crash
- use svc ISR swith to task1
Fix: Move clear spinlock to the end of svc ISR
Signed-off-by: ligd [email protected]
Impact
SMP
Testing
VELA
@masayuki2009 Could you help to review & test this ?
@GUIDINGLI
Let me try this PR later.
@GUIDINGLI It seems that this PR affects all CPU architectures which support SMP. However, you modified ARM only. Please test this PR with all CPU architectures including sim.
@GUIDINGLI Also, can you provide the test code so that we can reproduce the bug?
@GUIDINGLI Also, can you provide the test code so that we can reproduce the bug?
start a hello world main like this. while(1);
just busy loop one cpu, then do anything in other thread.
@GUIDINGLI Also, can you provide the test code so that we can reproduce the bug?
start a hello world main like this. while(1);
just busy loop one cpu, then do anything in other thread.
@GUIDINGLI
I tried your test case with both sabre-6quad:smp (CONFIG_SMP_NCPUS=2)
and rv-virt:smp64 (CONFIG_SMP_NCPUS=2)
on qemu-6.2 but I couldn't reproduce the bug.
Can you reproduce the issue with the above environments?
ALL the arch.
ALL the arch.
@GUIDINGLI
Hmm, it seems that sim:smp
stops with DEBUGASSERT
$ ./nuttx
[CPU0] up_assert: Assertion failed CPU0 at file:irq/irq_csection.c line: 348 task: loop_task
[CPU0] up_assert: Assertion failed CPU0 at file:irq/irq_csection.c line: 348 task: loop_task
[CPU0] up_assert: Assertion failed CPU0 at file:irq/irq_csection.c line: 348 task: loop_task
[CPU0] up_assert: Assertion failed CPU0 at file:irq/irq_csection.c line: 348 task: loop_task
[CPU0] up_assert: Assertion failed CPU0 at file:irq/irq_csection.c line: 348 task: loop_task
@GUIDINGLI
Can you attach both defconfig and .config that you can reproduce the issue?
I meet another problem related to this, I'm work on it.
@GUIDINGLI do we need to convert this change to draft while you are investigating the new issue you met?
This should be test base on: https://github.com/apache/incubator-nuttx/pull/7015
@masayuki2009 please help to review
@masayuki2009 could you test with your hardware?
@GUIDINGLI Can you reproduce the crash with QEMU?
Can you reproduce the crash with QEMU?
@GUIDINGLI Also, please rebase this PR to the latest master.
Can you reproduce the crash with QEMU?
@GUIDINGLI Also, please rebase this PR to the latest master.
done
@GUIDINGLI Can you reproduce the crash with QEMU?
I have tried on sabre-6quad:netnsh_smp, can't reproduce this bug. And tried set NCPUS=2, close RR_INTERVAL, set CONFIG_SYSTEM_NSH_PRIORITY=255, But can't reproduce.
I have tried on sabre-6quad:netnsh_smp, can't reproduce this bug. And tried set NCPUS=2, close RR_INTERVAL, set CONFIG_SYSTEM_NSH_PRIORITY=255, But can't reproduce.
@GUIDINGLI Hmm, is it true that you can still reproduce the issue with VELA? If so, please attach both of defconfig and .config so that we can look into them.
Also,
I noticed that qemu-a53:nsh_smp
does not work with this PR.
qemu-a53:nsh_smp has been fixed.
Also correct the risc-v & mips arch.
is it true that you can still reproduce the issue with VELA? -I will do the double check.
@GUIDINGLI Currently, armv8-m and mips32 do not support SMP. So I think the following file should not be changed.
arch/arm/src/armv8-m/arm_doirq.c arch/mips/src/mips32/mips_doirq.c arch/mips/src/pic32mx/pic32mx_decodeirq.c arch/mips/src/pic32mz/pic32mz_decodeirq.c
Current not support, maybe support in future. So it is better modified all.
Current not support, maybe support in future. So it is better modified all.
@GUIDINGLI If so, how about other CPU architectures such as x86_64 and sparc? In my personal opinion, this PR should focus on the currently available SMP platforms.
let's add all arch? Otherwise people may forget to apply this change to other arch if they implement SMP for that arch someday.
@masayuki2009 @xiaoxiang781216 Now we only need modify the arch which already support SMP. So, I remove the mips & add ceva.
LGTM, let's ignore the false alarm for macOS CI.