sigdeliver/smp: restored registers should be protected by critical section
Summary
sigdeliver/smp: restored registers should be protected by critical section
[CPU1] [24] up_assert: Assertion failed CPU1 at file:armv7-a/arm_sigdeliver.c line: 73 task: waiter [CPU1] [24] arm_registerdump: R0: 00000001 R1: 108321d8 R2: 1081922c R3: 108321d8 [CPU1] [24] arm_registerdump: R4: 10832158 R5: 10817070 R6: 1083a654 FP: 108321d8
Signed-off-by: chao.an [email protected]
Impact
N/A
Testing
sabre-6quad/smp, ostest
@anchao
I've never seen the following DEBUGASSERTION which you described above.
[CPU1] [24] up_assert: Assertion failed CPU1 at file:armv7-a/arm_sigdeliver.c line: 73 task: waiter
Is it possible to reproduce the assertion with sabre-6quad:smp on QEMU?
Hmm, something is wrong with the CI. Let me restart the CI.
Current runner version: '2.287.1'
Operating System
Virtual Environment
Virtual Environment Provisioner
GITHUB_TOKEN Permissions
Secret source: None
Prepare workflow directory
Prepare all required actions
Getting action download info
Download action repository 'actions/download-artifact@v2' (SHA:f023be2c48cc18debc3bacd34cb396e0295e2869)
Warning: Failed to download action 'https://api.github.com/repos/actions/download-artifact/tarball/f023be2c48cc18debc3bacd34cb396e0295e2869'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
Warning: Back off 15.489 seconds before retry.
Warning: Failed to download action 'https://api.github.com/repos/actions/download-artifact/tarball/f023be2c48cc18debc3bacd34cb396e0295e2869'. Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
Warning: Back off 10.727 seconds before retry.
Error: The request was canceled due to the configured HttpClient.Timeout of 100 seconds elapsing.
Is it possible to reproduce the assertion with sabre-6quad:smp on QEMU?
Yes, this issue only occurred once during ostest for a long time( 5+ qemu process, about ~12h), crash case is : https://github.com/apache/incubator-nuttx-apps/blob/master/testing/ostest/signest.c#L198 (not sighan)
@anchao
I remember that https://github.com/apache/incubator-nuttx/pull/2960 changed sigdeliver handling for SMP. Do you think that this issue relates to the above PR?
I remember that https://github.com/apache/incubator-nuttx/pull/2960 changed sigdeliver handling for SMP. Do you think that this issue relates to the above PR?
Yes, this PR is related with #2960, but they are resolve different issues.
But I found that this PR does not fix the SMP issue completely, from the debug trace, if CPU0 enter the critical section, CPU1 was still running the current tcb without pause.
CPU 0 -> up_schedule_sigaction() in (critical section)
CPU 1 -> arm_sigdeliver()
CPU 0 -> up_schedule_sigaction() out (critical section)
panic
I will investigate this issue further.