signal: fix deadlock when sigdeliver call enter_critical_section
Summary
cpu0 cpu1:
user_main
signest_test
sched_unlock
nxsched_merge_pending
nxsched_add_readytorun
up_cpu_pause
arm_sigdeliver
enter_critical_section
Reason: In the SMP, cpu0 is already in the critical section and waiting for cpu1 to enter the suspended state. However, when cpu1 executes arm_sigdeliver, it is in the irq-disabled state but not in the critical section. At this point, cpu1 is unable to respond to interrupts and is continuously attempting to enter the critical section, resulting in a deadlock.
Resolve: adjust the logic, do not entering the critical section when interrupt-disabled.
Impact
NONE
Testing
test: We can use qemu for testing.
compiling make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20 running qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx
@hujun260 Why didn't you modify arch/sim/src/sim/sim_sigdeliver.c? I think you need to modify the file as well.
@hujun260 Why didn't you modify arch/sim/src/sim/sim_sigdeliver.c? I think you need to modify the file as well.
The implementation of signal processing in SIM is quite special. When we enter sim_sigdeliver, we are already in the critical section, so there is no chance of a deadlock.
The implementation of signal processing in SIM is quite special. When we enter sim_sigdeliver, we are already in the critical section, so there is no chance of a deadlock.
@hujun260 Hmm, it seems that ASSERT still happens when I run ostest with sim:smp on ubuntu 20.04 (amd64).
NuttShell (NSH) NuttX-12.5.1
nsh> uname -a
NuttX 12.5.1 4197b5aec8 May 26 2024 21:53:42 sim sim
nsh> ps
PID GROUP CPU PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 0 FIFO Kthread - Assigned 0000000000000000 066544 004744 7.1% CPU0 IDLE
1 1 1 0 FIFO Kthread - Running 0000000000000000 066544 008792 13.2% CPU1 IDLE
2 2 2 0 FIFO Kthread - Running 0000000000000000 066544 008744 13.1% CPU2 IDLE
3 3 3 0 FIFO Kthread - Running 0000000000000000 066544 004664 7.0% CPU3 IDLE
4 4 --- 224 FIFO Kthread - Waiting Signal 0000000000000000 067536 004904 7.2% loop_task
5 5 --- 224 FIFO Kthread - Waiting Semaphore 0000000000000000 067504 001048 1.5% hpwork 0x46b6a0 0x46b6c8
6 6 0 100 FIFO Task - Running 0000000000000000 067536 003304 4.8% nsh_main
nsh> free
total used free maxused maxfree nused nfree
Umem: 67108864 414416 66694448 414384 66694448 30 1
nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
...
user_main: nested signal handler test
signest_test: Starting signal waiter task at priority 101
signest_test: Started waiter_main pid=175
waiter_main: Waiter started
signest_test: Starting interfering task at priority 102
waiter_main: Setting signal mask
waiter_main: Registering signal handler
signest_test: Started interfere_main pid=185
interfere_main: Waiting on semaphore
waiter_main: Waiting on semaphore
signest_test: Simple case:
Total signalled 1240 Odd=620 Even=620
Total handled 1240 Odd=620 Even=620
Total nested 0 Odd=0 Even=0
signest_test: With task locking
Total signalled 2480 Odd=1240 Even=1240
Total handled 2480 Odd=1240 Even=1240
Total nested 0 Odd=0 Even=0
[CPU0] _assert: Current Version: NuttX 12.5.1 4197b5aec8 May 26 2024 21:53:42 sim
[CPU0] _assert: Assertion failed : at file: signest.c:228 task(CPU0): ostest process: ostest 0x43af44
ostest_main: Exiting with status 256
stdio_test: Standard I/O Check: fprintf to stderr
nsh>
The implementation of signal processing in SIM is quite special. When we enter sim_sigdeliver, we are already in the critical section, so there is no chance of a deadlock.
@hujun260 Hmm, it seems that ASSERT still happens when I run ostest with sim:smp on ubuntu 20.04 (amd64).
NuttShell (NSH) NuttX-12.5.1 nsh> uname -a NuttX 12.5.1 4197b5aec8 May 26 2024 21:53:42 sim sim nsh> ps PID GROUP CPU PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND 0 0 0 0 FIFO Kthread - Assigned 0000000000000000 066544 004744 7.1% CPU0 IDLE 1 1 1 0 FIFO Kthread - Running 0000000000000000 066544 008792 13.2% CPU1 IDLE 2 2 2 0 FIFO Kthread - Running 0000000000000000 066544 008744 13.1% CPU2 IDLE 3 3 3 0 FIFO Kthread - Running 0000000000000000 066544 004664 7.0% CPU3 IDLE 4 4 --- 224 FIFO Kthread - Waiting Signal 0000000000000000 067536 004904 7.2% loop_task 5 5 --- 224 FIFO Kthread - Waiting Semaphore 0000000000000000 067504 001048 1.5% hpwork 0x46b6a0 0x46b6c8 6 6 0 100 FIFO Task - Running 0000000000000000 067536 003304 4.8% nsh_main nsh> free total used free maxused maxfree nused nfree Umem: 67108864 414416 66694448 414384 66694448 30 1 nsh> ostest stdio_test: write fd=1 stdio_test: Standard I/O Check: printf stdio_test: write fd=2 ... user_main: nested signal handler test signest_test: Starting signal waiter task at priority 101 signest_test: Started waiter_main pid=175 waiter_main: Waiter started signest_test: Starting interfering task at priority 102 waiter_main: Setting signal mask waiter_main: Registering signal handler signest_test: Started interfere_main pid=185 interfere_main: Waiting on semaphore waiter_main: Waiting on semaphore signest_test: Simple case: Total signalled 1240 Odd=620 Even=620 Total handled 1240 Odd=620 Even=620 Total nested 0 Odd=0 Even=0 signest_test: With task locking Total signalled 2480 Odd=1240 Even=1240 Total handled 2480 Odd=1240 Even=1240 Total nested 0 Odd=0 Even=0 [CPU0] _assert: Current Version: NuttX 12.5.1 4197b5aec8 May 26 2024 21:53:42 sim [CPU0] _assert: Assertion failed : at file: signest.c:228 task(CPU0): ostest process: ostest 0x43af44 ostest_main: Exiting with status 256 stdio_test: Standard I/O Check: fprintf to stderr nsh>
I have debugged the issue you mentioned, and it is not caused by a deadlock. It should be due to another reason. My patch did not resolve this issue.