ARM64 SMP IRQ stack bug modify request
Summary
ARM64 SMP IRQ stack bug modify request
In previous PR: https://github.com/apache/nuttx/pull/13520
We have fix a SMP bug in ARM64, will cause by don't use interrupt stack in SMP mode. The rootcause you can find in the SMP description:
config SMP
bool "Symmetric Multi-Processing (SMP)"
default n
depends on ARCH_HAVE_MULTICPU
depends on ARCH_HAVE_TESTSET
depends on ARCH_INTERRUPTSTACK != 0
select SPINLOCK
select IRQCOUNT
---help---
Enables support for Symmetric Multi-Processing (SMP) on a multi-CPU
platform.
N.B. SMP mode requires the use of ARCH_INTERRUPTSTACK:
CPU0 thread0 -> IRQ enter -> add thread0 to block list -> IRQ leave(crash)
||
/\
/ \
CPU1 thread1 -> block_task -> take thread0 from block_list -> run thread0
CPU0 IRQ handler use thread0's stack, but thread0 may switch to CPU1, that
will caused IRQ handler stack corruption.
So we must use the IRQ stack in svc/IRQ ISR entry: arm64_sync_exc arm64_irq_handler
But this modification has conflict with the kernel mode which provide by @pussuw @jlaitine
Then we revert the changes emergency in : https://github.com/apache/nuttx/pull/13590
But this bug still exist in current code.
There are two method to resolve this:
- REVERT YOUR KERNEL MODE RELATED CODE IN ARM64, and I will fix this bug in my method, then you can apply your kernel mode changes.
- FIX THE SMP BUG IN YOUR METHOD !
Another thing, about the FPU, The removable of FPU trap reason:
- The FPU trap logic is not stable in some SMP situation (HERE haven't the detail analysis)
- Make the logic Complex and different to read.
- The most important reason: it has almost no benefit for the performance, because the compiler will optimize the code with float/double instructions when use the option -O3 (always be). Even the printf/syslog will use use the FPU, so the FPU is used too often and almost each context switch. Or, you must use -mgeneral-regs-only to limit the FPU registers usage, that will harmful for the speed.
- How about use the FPU registers in the IRQ/SVC ISR, maybe there are hidden bugs ?
So, suggest remove the FPU trap, save/restore in every context switch, similar with ARMv7-a/r
Impact
None
Testing
None
@GUIDINGLI please send this change request to issue list, this is not a pull request for review
I going to revert the arm64 patch use this PR if they are not replay for a long time
@pussuw could you look at this problem?
@pussuw could you look at this problem?
I already responded to #13520 I'll take a closer look today or on Monday