nuttx sigdeliver/smp: restored registers should be protected by critical section

Summary

sigdeliver/smp: restored registers should be protected by critical section

[CPU1] [24] up_assert: Assertion failed CPU1 at file:armv7-a/arm_sigdeliver.c line: 73 task: waiter [CPU1] [24] arm_registerdump: R0: 00000001 R1: 108321d8 R2: 1081922c R3: 108321d8 [CPU1] [24] arm_registerdump: R4: 10832158 R5: 10817070 R6: 1083a654 FP: 108321d8

Signed-off-by: chao.an [email protected]

Impact

N/A

Testing

sabre-6quad/smp, ostest

Mar 03 '22 07:03 anchao

@anchao

I've never seen the following DEBUGASSERTION which you described above.

[CPU1] [24] up_assert: Assertion failed CPU1 at file:armv7-a/arm_sigdeliver.c line: 73 task: waiter

Is it possible to reproduce the assertion with sabre-6quad:smp on QEMU?

Mar 03 '22 10:03 masayuki2009

Hmm, something is wrong with the CI. Let me restart the CI.

Current runner version: '2.287.1'
Operating System
Virtual Environment
Virtual Environment Provisioner
GITHUB_TOKEN Permissions
Secret source: None
Prepare workflow directory
Prepare all required actions
Getting action download info
Download action repository 'actions/download-artifact@v2' (SHA:f023be2c48cc18debc3bacd34cb396e0295e2869)
Warning: Failed to download action 'https://api.github.com/repos/actions/download-artifact/tarball/f023be2c48cc18debc3bacd34cb396e0295e2869'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
Warning: Back off 15.489 seconds before retry.
Warning: Failed to download action 'https://api.github.com/repos/actions/download-artifact/tarball/f023be2c48cc18debc3bacd34cb396e0295e2869'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
Warning: Back off 10.727 seconds before retry.
Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.

Mar 03 '22 10:03 masayuki2009

Is it possible to reproduce the assertion with sabre-6quad:smp on QEMU?

Yes, this issue only occurred once during ostest for a long time( 5+ qemu process, about ~12h), crash case is : https://github.com/apache/incubator-nuttx-apps/blob/master/testing/ostest/signest.c#L198 (not sighan)

Mar 03 '22 10:03 anchao

@anchao

I remember that https://github.com/apache/incubator-nuttx/pull/2960 changed sigdeliver handling for SMP. Do you think that this issue relates to the above PR?

Mar 04 '22 04:03 masayuki2009

I remember that https://github.com/apache/incubator-nuttx/pull/2960 changed sigdeliver handling for SMP. Do you think that this issue relates to the above PR?

Yes, this PR is related with #2960, but they are resolve different issues.

But I found that this PR does not fix the SMP issue completely, from the debug trace, if CPU0 enter the critical section, CPU1 was still running the current tcb without pause.

CPU 0 ->  up_schedule_sigaction()  in  (critical section)
CPU 1 ->  arm_sigdeliver()
CPU 0 ->  up_schedule_sigaction() out  (critical section)
panic

I will investigate this issue further.

Mar 07 '22 12:03 anchao