nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

signal: fix deadlock when sigdeliver call enter_critical_section

Open hujun260 opened this issue 1 year ago • 4 comments

Summary

cpu0 cpu1:

user_main signest_test sched_unlock nxsched_merge_pending nxsched_add_readytorun up_cpu_pause
arm_sigdeliver enter_critical_section

Reason: In the SMP, cpu0 is already in the critical section and waiting for cpu1 to enter the suspended state. However, when cpu1 executes arm_sigdeliver, it is in the irq-disabled state but not in the critical section. At this point, cpu1 is unable to respond to interrupts and is continuously attempting to enter the critical section, resulting in a deadlock.

Resolve: adjust the logic, do not entering the critical section when interrupt-disabled.

Impact

NONE

Testing

test: We can use qemu for testing.

compiling make distclean -j20; ./tools/configure.sh -l qemu-armv8a:nsh_smp ;make -j20 running qemu-system-aarch64 -cpu cortex-a53 -smp 4 -nographic -machine virt,virtualization=on,gic-version=3 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx

hujun260 avatar May 20 '24 04:05 hujun260

@hujun260 Why didn't you modify arch/sim/src/sim/sim_sigdeliver.c? I think you need to modify the file as well.

masayuki2009 avatar May 20 '24 12:05 masayuki2009

@hujun260 Why didn't you modify arch/sim/src/sim/sim_sigdeliver.c? I think you need to modify the file as well.

The implementation of signal processing in SIM is quite special. When we enter sim_sigdeliver, we are already in the critical section, so there is no chance of a deadlock.

hujun260 avatar May 21 '24 09:05 hujun260

The implementation of signal processing in SIM is quite special. When we enter sim_sigdeliver, we are already in the critical section, so there is no chance of a deadlock.

@hujun260 Hmm, it seems that ASSERT still happens when I run ostest with sim:smp on ubuntu 20.04 (amd64).

NuttShell (NSH) NuttX-12.5.1
nsh> uname -a
NuttX 12.5.1 4197b5aec8 May 26 2024 21:53:42 sim sim
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread   - Assigned           0000000000000000 066544 004744   7.1%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread   - Running            0000000000000000 066544 008792  13.2%  CPU1 IDLE
    2     2   2   0 FIFO     Kthread   - Running            0000000000000000 066544 008744  13.1%  CPU2 IDLE
    3     3   3   0 FIFO     Kthread   - Running            0000000000000000 066544 004664   7.0%  CPU3 IDLE
    4     4 --- 224 FIFO     Kthread   - Waiting  Signal    0000000000000000 067536 004904   7.2%  loop_task
    5     5 --- 224 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 067504 001048   1.5%  hpwork 0x46b6a0 0x46b6c8
    6     6   0 100 FIFO     Task      - Running            0000000000000000 067536 003304   4.8%  nsh_main
nsh> free
                 total       used       free    maxused    maxfree  nused  nfree
      Umem:   67108864     414416   66694448     414384   66694448     30      1
nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
...
user_main: nested signal handler test
signest_test: Starting signal waiter task at priority 101
signest_test: Started waiter_main pid=175
waiter_main: Waiter started
signest_test: Starting interfering task at priority 102
waiter_main: Setting signal mask
waiter_main: Registering signal handler
signest_test: Started interfere_main pid=185
interfere_main: Waiting on semaphore
waiter_main: Waiting on semaphore
signest_test: Simple case:
  Total signalled 1240  Odd=620 Even=620
  Total handled   1240  Odd=620 Even=620
  Total nested    0    Odd=0   Even=0  
signest_test: With task locking
  Total signalled 2480  Odd=1240 Even=1240
  Total handled   2480  Odd=1240 Even=1240
  Total nested    0    Odd=0   Even=0  
[CPU0] _assert: Current Version: NuttX  12.5.1 4197b5aec8 May 26 2024 21:53:42 sim
[CPU0] _assert: Assertion failed : at file: signest.c:228 task(CPU0): ostest process: ostest 0x43af44
ostest_main: Exiting with status 256
stdio_test: Standard I/O Check: fprintf to stderr
nsh>

masayuki2009 avatar May 26 '24 13:05 masayuki2009

The implementation of signal processing in SIM is quite special. When we enter sim_sigdeliver, we are already in the critical section, so there is no chance of a deadlock.

@hujun260 Hmm, it seems that ASSERT still happens when I run ostest with sim:smp on ubuntu 20.04 (amd64).

NuttShell (NSH) NuttX-12.5.1
nsh> uname -a
NuttX 12.5.1 4197b5aec8 May 26 2024 21:53:42 sim sim
nsh> ps
  PID GROUP CPU PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0   0 FIFO     Kthread   - Assigned           0000000000000000 066544 004744   7.1%  CPU0 IDLE
    1     1   1   0 FIFO     Kthread   - Running            0000000000000000 066544 008792  13.2%  CPU1 IDLE
    2     2   2   0 FIFO     Kthread   - Running            0000000000000000 066544 008744  13.1%  CPU2 IDLE
    3     3   3   0 FIFO     Kthread   - Running            0000000000000000 066544 004664   7.0%  CPU3 IDLE
    4     4 --- 224 FIFO     Kthread   - Waiting  Signal    0000000000000000 067536 004904   7.2%  loop_task
    5     5 --- 224 FIFO     Kthread   - Waiting  Semaphore 0000000000000000 067504 001048   1.5%  hpwork 0x46b6a0 0x46b6c8
    6     6   0 100 FIFO     Task      - Running            0000000000000000 067536 003304   4.8%  nsh_main
nsh> free
                 total       used       free    maxused    maxfree  nused  nfree
      Umem:   67108864     414416   66694448     414384   66694448     30      1
nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
...
user_main: nested signal handler test
signest_test: Starting signal waiter task at priority 101
signest_test: Started waiter_main pid=175
waiter_main: Waiter started
signest_test: Starting interfering task at priority 102
waiter_main: Setting signal mask
waiter_main: Registering signal handler
signest_test: Started interfere_main pid=185
interfere_main: Waiting on semaphore
waiter_main: Waiting on semaphore
signest_test: Simple case:
  Total signalled 1240  Odd=620 Even=620
  Total handled   1240  Odd=620 Even=620
  Total nested    0    Odd=0   Even=0  
signest_test: With task locking
  Total signalled 2480  Odd=1240 Even=1240
  Total handled   2480  Odd=1240 Even=1240
  Total nested    0    Odd=0   Even=0  
[CPU0] _assert: Current Version: NuttX  12.5.1 4197b5aec8 May 26 2024 21:53:42 sim
[CPU0] _assert: Assertion failed : at file: signest.c:228 task(CPU0): ostest process: ostest 0x43af44
ostest_main: Exiting with status 256
stdio_test: Standard I/O Check: fprintf to stderr
nsh>

I have debugged the issue you mentioned, and it is not caused by a deadlock. It should be due to another reason. My patch did not resolve this issue.

hujun260 avatar May 27 '24 01:05 hujun260