nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

ARM64 SMP IRQ stack bug modify request

Open GUIDINGLI opened this issue 1 year ago • 4 comments

Summary

ARM64 SMP IRQ stack bug modify request

In previous PR: https://github.com/apache/nuttx/pull/13520

We have fix a SMP bug in ARM64, will cause by don't use interrupt stack in SMP mode. The rootcause you can find in the SMP description:

config SMP
    bool "Symmetric Multi-Processing (SMP)"
    default n
    depends on ARCH_HAVE_MULTICPU
    depends on ARCH_HAVE_TESTSET
    depends on ARCH_INTERRUPTSTACK != 0 
    select SPINLOCK
    select IRQCOUNT
    ---help---
        Enables support for Symmetric Multi-Processing (SMP) on a multi-CPU
        platform.

        N.B. SMP mode requires the use of ARCH_INTERRUPTSTACK:

        CPU0 thread0  ->  IRQ enter -> add thread0 to block list -> IRQ leave(crash)
                                                                ||
                                                                /\
                                                               /  \
        CPU1 thread1  ->  block_task -> take thread0 from block_list -> run thread0

        CPU0 IRQ handler use thread0's stack, but thread0 may switch to CPU1, that 
        will caused IRQ handler stack corruption.

So we must use the IRQ stack in svc/IRQ ISR entry: arm64_sync_exc arm64_irq_handler

But this modification has conflict with the kernel mode which provide by @pussuw @jlaitine

Then we revert the changes emergency in : https://github.com/apache/nuttx/pull/13590

But this bug still exist in current code.

There are two method to resolve this:

  1. REVERT YOUR KERNEL MODE RELATED CODE IN ARM64, and I will fix this bug in my method, then you can apply your kernel mode changes.
  2. FIX THE SMP BUG IN YOUR METHOD !

Another thing, about the FPU, The removable of FPU trap reason:

  1. The FPU trap logic is not stable in some SMP situation (HERE haven't the detail analysis)
  2. Make the logic Complex and different to read.
  3. The most important reason: it has almost no benefit for the performance, because the compiler will optimize the code with float/double instructions when use the option -O3 (always be). Even the printf/syslog will use use the FPU, so the FPU is used too often and almost each context switch. Or, you must use -mgeneral-regs-only to limit the FPU registers usage, that will harmful for the speed.
  4. How about use the FPU registers in the IRQ/SVC ISR, maybe there are hidden bugs ?

So, suggest remove the FPU trap, save/restore in every context switch, similar with ARMv7-a/r

Impact

None

Testing

None

GUIDINGLI avatar Sep 26 '24 16:09 GUIDINGLI

@GUIDINGLI please send this change request to issue list, this is not a pull request for review

anchao avatar Sep 27 '24 00:09 anchao

I going to revert the arm64 patch use this PR if they are not replay for a long time

GUIDINGLI avatar Sep 27 '24 02:09 GUIDINGLI

@pussuw could you look at this problem?

xiaoxiang781216 avatar Sep 27 '24 03:09 xiaoxiang781216

@pussuw could you look at this problem?

I already responded to #13520 I'll take a closer look today or on Monday

pussuw avatar Sep 27 '24 05:09 pussuw