nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

armv7/8m: use pendsv to handle context switch

Open GUIDINGLI opened this issue 1 year ago • 16 comments

Summary

armv6/7/8m: use pendsv to handle context switch

This PR support Nested interrupt in armv6/7/8m:

There are two types of nested interrupt model:

  1. Zero latency nested interrupt
Interrupt           Priority            Note
Data abort          Highest
SVC                 0x50
High irq1           0x60             ISR can't access system API
irq_save()          0x70
High irq2           0x80             ISR can't access system API
normal irq3         0xB0

We have already support this mode before this PR

  1. Nested interrupt which interrupt level lower than up_irq_save()
Interrupt           Priority            Note
Data abort          Highest
SVC                 0x70
irq_save()          0x80
High irq1           0x90              ISR can access system API
High irq2           0xA0              ISR can access system API
normal irq3         0xB0

Now, this PR can support this mode

Impact

arm-m

Testing

qemu

./tools/configure.sh mps2-an500:nsh -j8

which described in: https://github.com/apache/nuttx-apps/pull/2535

GUIDINGLI avatar Sep 25 '24 10:09 GUIDINGLI

[Experimental Bot, please feedback here]

NuttX PR Requirements Review:

Verdict: The PR does not fully meet the NuttX requirements.

Reasoning:

  • Summary:
    • While the summary describes the technical implementation ("use pendsv to handle context switch"), it lacks a clear explanation of why this change is necessary.
    • What specific problems did the lack of nested interrupt support cause?
    • The summary should explicitly state that this enhances nested interrupt support for ARMv6/7/8m architectures.
  • Impact:
    • Impact on user: The PR states "arm-m" as the impact. This is insufficient. It needs to clarify if users need to adapt to this change. Will existing code using interrupts behave differently?
    • Other impacts: The PR is missing information about potential impacts on:
      • Build process
      • Documentation (does this require updates?)
      • Security (are there any implications of changing interrupt handling?)
      • Compatibility (any backward/forward compatibility concerns?)
  • Testing:
    • The testing section lacks details.
    • Build Hosts: Specify the OS, CPU architecture, and compiler versions used for testing.
    • Target(s): Provide more specific information about the "airoha chip" and the QEMU configuration used.
    • Logs: Instead of just stating "your testing logs here," include relevant snippets of the logs demonstrating the functionality before and after the change. Highlight the key differences and improvements.

Recommendations:

  1. Improve the Summary: Explain the "why" behind the change. Be explicit about the benefits of nested interrupt support and how this PR addresses them.
  2. Expand Impact Assessment: Address all impact points with specific details. If an impact area is not applicable, state it explicitly (e.g., "Impact on build: N/A").
  3. Provide Thorough Testing Information: Include specific details about the testing environment, target hardware/configurations, and relevant log snippets showcasing the improvement.

By addressing these points, the PR will be more informative and better aligned with NuttX's contribution guidelines.

nuttxpr avatar Sep 25 '24 10:09 nuttxpr

@GUIDINGLI did you test High Priority / Zero Latency support (it is called highpri board config) ?

acassis avatar Sep 25 '24 22:09 acassis

airoha chip & qemu

I think airoha chip is not included to the NuttX upstream. Why don't you test this PR with supported boards such as STM32F4Discovery?

Also, please specify which configuration you tested with QEMU.

masayuki2009 avatar Sep 25 '24 23:09 masayuki2009

airoha chip & qemu

I think airoha chip is not included to the NuttX upstream. Why don't you test this PR with supported boards such as STM32F4Discovery?

Also, please specify which configuration you tested with QEMU.

Why must use STM32F4Discovery board ? Did the irq driver support HIGH prio irq in this board ?

If some one chip wants the nested irq which described in the Summary second model, the irq handler must obey some rules:

  1. the hw irq prio should not upon irq_save()
  2. the ISR should consider interrupted by other irq, (not a atomic operation any more)

So, I don't think the STM32F4Discovery chip driver have consider this.

And I confirm the airoha irq driver have handled the rules in above. This mainly tell the reviewer, I have test the PR in a real chip, a real project, a real device.

Also you want run the PR in your own environment. Then I provide the Qemu test case. See the Summary.

GUIDINGLI avatar Sep 26 '24 05:09 GUIDINGLI

Why must use STM32F4Discovery board ? Did the irq driver support HIGH prio irq in this board ?

@GUIDINGLI So, do you mean that this PR ** does not affect ** the boards that do not use the HIGH prio irq?

masayuki2009 avatar Sep 26 '24 05:09 masayuki2009

Why must use STM32F4Discovery board ? Did the irq driver support HIGH prio irq in this board ?

@GUIDINGLI So, do you mean that this PR ** does not affect ** the boards that do not use the HIGH prio irq?

It does, I also have test the normal mode in other board, but not STM32F4Discovery.

In fact, we have a test team, and do the automation daily test in several boards, like: BES board on xiaomi watch, BES board on smart speaker, and so on. This can cover the normal prio irq.

NO STM32F4Discovery current now, next step I can suggest the test team add the this board to automation test

GUIDINGLI avatar Sep 26 '24 06:09 GUIDINGLI

@xiaoxiang781216 @masayuki2009 @acassis @pkarashchenko @raiden00pl

Please help to review this PR

GUIDINGLI avatar Sep 26 '24 09:09 GUIDINGLI

could you move all changes other than related to pendsv to a separate PR ? This is a change with a potentially huge impact, so it's better to have it isolated if possible.

raiden00pl avatar Sep 26 '24 15:09 raiden00pl

could you move all changes other than related to pendsv to a separate PR ? This is a change with a potentially huge impact, so it's better to have it isolated if possible.

OK, split to: https://github.com/apache/nuttx/pull/13651

I will rebase this PR, after 13651 merged.

GUIDINGLI avatar Sep 26 '24 16:09 GUIDINGLI

@GUIDINGLI High Performance / Zero Latency is something different and I think you should test it too: https://cwiki.apache.org/confluence/display/NUTTX/High+Performance%2C+Zero+Latency+Interrupts

acassis avatar Sep 26 '24 17:09 acassis

What about this workaround for high priority interrupts which is based on PendSV ? https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html#getting-back-into-the-game

If PendSV is used for something else, then it's not possible.

raiden00pl avatar Sep 26 '24 18:09 raiden00pl

@GUIDINGLI High Performance / Zero Latency is something different and I think you should test it too: https://cwiki.apache.org/confluence/display/NUTTX/High+Performance%2C+Zero+Latency+Interrupts

Got it, I will test the Zero Latency nested interrupt.

GUIDINGLI avatar Sep 27 '24 08:09 GUIDINGLI

What about this workaround for high priority interrupts which is based on PendSV ? https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html#getting-back-into-the-game

If PendSV is used for something else, then it's not possible.

I will look into the doc, and update it when necessary.

GUIDINGLI avatar Sep 27 '24 08:09 GUIDINGLI

@GUIDINGLI High Performance / Zero Latency is something different and I think you should test it too: https://cwiki.apache.org/confluence/display/NUTTX/High+Performance%2C+Zero+Latency+Interrupts

Got it, I will test the Zero Latency nested interrupt.

Please test it using the viewtool-stm32f107 (you will need to use a logic analyzer to see the pulses and measure the jitter) or the nucleo-f302r8 (you will need to use an oscilloscope)

acassis avatar Sep 27 '24 10:09 acassis

What about this workaround for high priority interrupts which is based on PendSV ? https://nuttx.apache.org/docs/latest/guides/zerolatencyinterrupts.html#getting-back-into-the-game

If PendSV is used for something else, then it's not possible.

Still possible since PendSV could continue dispatch to the original interrupt handler.

xiaoxiang781216 avatar Sep 28 '24 06:09 xiaoxiang781216

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

github-actions[bot] avatar Sep 29 '24 09:09 github-actions[bot]

@GUIDINGLI High Performance / Zero Latency is something different and I think you should test it too: https://cwiki.apache.org/confluence/display/NUTTX/High+Performance%2C+Zero+Latency+Interrupts

Got it, I will test the Zero Latency nested interrupt.

Hi @GUIDINGLI did you test HighPerf/Zero Latency after your modification?

acassis avatar Oct 03 '24 15:10 acassis

@GUIDINGLI High Performance / Zero Latency is something different and I think you should test it too: https://cwiki.apache.org/confluence/display/NUTTX/High+Performance%2C+Zero+Latency+Interrupts

https://github.com/apache/nuttx-apps/pull/2646

GUIDINGLI avatar Oct 06 '24 04:10 GUIDINGLI

@GUIDINGLI High Performance / Zero Latency is something different and I think you should test it too: https://cwiki.apache.org/confluence/display/NUTTX/High+Performance%2C+Zero+Latency+Interrupts

Got it, I will test the Zero Latency nested interrupt.

Hi @GUIDINGLI did you test HighPerf/Zero Latency after your modification?

@acassis Here is the new testcase. https://github.com/apache/nuttx-apps/pull/2646

GUIDINGLI avatar Oct 06 '24 04:10 GUIDINGLI

@GUIDINGLI

I noticed that assertion happens during signest_test with spresense:smp (NCPUS=3).

signest_test: Simple case:
  Total signalled 1240  Odd=620 Even=620
  Total handled   1240  Odd=620 Even=620
  Total nested    0    Odd=0   Even=0  
signest_test: With task locking
  Total signalled 2480  Odd=1240 Even=1240
  Total handled   2480  Odd=1240 Even=1240
  Total nested    0    Odd=0   Even=0  
[CPU0] dump_assert_info: Current Version: NuttX  12.0.0 24527c1a00 Oct 11 2024 08:20:50 arm
[CPU0] dump_assert_info: Assertion failed : at file: :0 task(CPU0): ostest process: ostest 0xd010b19
[CPU0] up_dump_register: R0: 2d0305c0 R1: 00000001 R2: 00000000  R3: 0e002000
[CPU0] up_dump_register: R4: 00000000 R5: 2d033af8 R6: 00000000  FP: 00000000
[CPU0] up_dump_register: R8: 00000006 SB: 2d030090 SL: 2d0305c0 R11: 000000d8
[CPU0] up_dump_register: IP: 00000000 SP: 2d0387c0 LR: 0d004cff  PC: 0d004cff
[CPU0] up_dump_register: xPSR: 60000000 BASEPRI: 00000080 CONTROL: 00000006
[CPU0] up_dump_register: EXC_RETURN: 00000000
[CPU0] dump_stackinfo: User Stack:
[CPU0] dump_stackinfo:   base: 0x2d0369c0
[CPU0] dump_stackinfo:   size: 00008120
[CPU0] dump_stackinfo:     sp: 0x2d0387c0
[CPU0] stack_dump: 0x2d0387a0: 0d02b93f 2d0305c0 2d033af8 0d02ad45 0d02ad45 00000006 2d030090 0d004dbd
[CPU0] stack_dump: 0x2d0387c0: 0d02ad45 00000000 00000000 2d033ba4 2d033ba4 0d010b19 00000000 00000000
[CPU0] stack_dump: 0x2d0387e0: 00000000 00000000 2d033af8 2d0305c0 00000000 00000000 00000000 7474754e
[CPU0] stack_dump: 0x2d038800: 00000058 00000000 00000000 00000000 00000002 2d033b9c 2d032ea8 2d02fef8
[CPU0] stack_dump: 0x2d038820: 00000000 0d006773 0d006788 21000000 2e323100 00302e30 00000000 00000000
[CPU0] stack_dump: 0x2d038840: 00000000 34320000 63373235 30306131 74634f20 20313120 34323032 3a383020
[CPU0] stack_dump: 0x2d038860: 353a3032 ffff0030 2d02e7e0 2d033b4c 00000000 ffffffea 6d726100 ffffff00
[CPU0] stack_dump: 0x2d038880: 00000000 ffffff92 2d031d84 2d031d74 2d031d70 00000002 2d031d84 2d031d74
[CPU0] stack_dump: 0x2d0388a0: 2d031d70 0000002b 2d031d9c 0000000a 0000002a 0d00aa53 00000000 0d012bb5
[CPU0] stack_dump: 0x2d0388c0: 0000005a 1dcd6500 2d033af8 2d031d6c 2d031d78 0d012e33 ffffffff 00000066
[CPU0] stack_dump: 0x2d0388e0: 0000003a 0000003b 00010066 00000000 00000000 00002000 2d033cc8 00000005
[CPU0] stack_dump: 0x2d038900: 2d031c80 00000000 2d036988 0d02da9b 2d0369b6 00000000 00000000 0d010d2f
[CPU0] stack_dump: 0x2d038920: 0014d040 00000001 0000003f 00147680 000059c0 00147680 00005b28 00000000
[CPU0] stack_dump: 0x2d038940: 00000000 0d010b19 00000005 2d036988 00000000 00000000 00000000 0d00ad77
[CPU0] stack_dump: 0x2d038960: 0d010b19 0d007811 00000000 00000000 00000000 00000000 00000000 00000000

To reproduce the assertion, please apply the following changes.

diff --git a/boards/arm/cxd56xx/spresense/configs/smp/defconfig b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
index 4209329e9e..e171235ea5 100644
--- a/boards/arm/cxd56xx/spresense/configs/smp/defconfig
+++ b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
@@ -28,6 +28,8 @@ CONFIG_CXD56_I2C=y
 CONFIG_CXD56_SPI4=y
 CONFIG_CXD56_SPI5=y
 CONFIG_CXD56_SPI=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_FEATURES=y
 CONFIG_DEBUG_FULLOPT=y
 CONFIG_DEBUG_SYMBOLS=y
 CONFIG_EXAMPLES_HELLO=y
@@ -36,6 +38,7 @@ CONFIG_FS_PROCFS_REGISTER=y
 CONFIG_HAVE_CXX=y
 CONFIG_HAVE_CXXINITIALIZE=y
 CONFIG_INIT_ENTRYPOINT="spresense_main"
+CONFIG_NDEBUG=y
 CONFIG_NSH_ARCHINIT=y
 CONFIG_NSH_BUILTIN_APPS=y
 CONFIG_NSH_READLINE=y
@@ -47,7 +50,7 @@ CONFIG_RR_INTERVAL=200
 CONFIG_RTC=y
 CONFIG_RTC_DRIVER=y
 CONFIG_SMP=y
-CONFIG_SMP_NCPUS=2
+CONFIG_SMP_NCPUS=3
 CONFIG_SPI=y
 CONFIG_START_DAY=3
 CONFIG_START_MONTH=10
@@ -57,5 +60,6 @@ CONFIG_SYSTEM_NSH=y
 CONFIG_SYSTEM_SYSTEM=y
 CONFIG_SYSTEM_TASKSET=y
 CONFIG_TESTING_OSTEST=y
+CONFIG_TESTING_OSTEST_LOOPS=10
 CONFIG_TESTING_SMP=y
 CONFIG_UART1_SERIAL_CONSOLE=y

masayuki2009 avatar Oct 11 '24 05:10 masayuki2009

Working on it, trying to find a sony board...

So, this issue can reproduce on Qemu ?

GUIDINGLI avatar Oct 11 '24 07:10 GUIDINGLI

So, this issue can reproduce on Qemu ?

It can not be reproducible on QEMU since armv7-m SMP target is not available on QEMU.

masayuki2009 avatar Oct 11 '24 08:10 masayuki2009

@GUIDINGLI

I noticed that assertion happens during signest_test with spresense:smp (NCPUS=3).

signest_test: Simple case:
  Total signalled 1240  Odd=620 Even=620
  Total handled   1240  Odd=620 Even=620
  Total nested    0    Odd=0   Even=0  
signest_test: With task locking
  Total signalled 2480  Odd=1240 Even=1240
  Total handled   2480  Odd=1240 Even=1240
  Total nested    0    Odd=0   Even=0  
[CPU0] dump_assert_info: Current Version: NuttX  12.0.0 24527c1a00 Oct 11 2024 08:20:50 arm
[CPU0] dump_assert_info: Assertion failed : at file: :0 task(CPU0): ostest process: ostest 0xd010b19
[CPU0] up_dump_register: R0: 2d0305c0 R1: 00000001 R2: 00000000  R3: 0e002000
[CPU0] up_dump_register: R4: 00000000 R5: 2d033af8 R6: 00000000  FP: 00000000
[CPU0] up_dump_register: R8: 00000006 SB: 2d030090 SL: 2d0305c0 R11: 000000d8
[CPU0] up_dump_register: IP: 00000000 SP: 2d0387c0 LR: 0d004cff  PC: 0d004cff
[CPU0] up_dump_register: xPSR: 60000000 BASEPRI: 00000080 CONTROL: 00000006
[CPU0] up_dump_register: EXC_RETURN: 00000000
[CPU0] dump_stackinfo: User Stack:
[CPU0] dump_stackinfo:   base: 0x2d0369c0
[CPU0] dump_stackinfo:   size: 00008120
[CPU0] dump_stackinfo:     sp: 0x2d0387c0
[CPU0] stack_dump: 0x2d0387a0: 0d02b93f 2d0305c0 2d033af8 0d02ad45 0d02ad45 00000006 2d030090 0d004dbd
[CPU0] stack_dump: 0x2d0387c0: 0d02ad45 00000000 00000000 2d033ba4 2d033ba4 0d010b19 00000000 00000000
[CPU0] stack_dump: 0x2d0387e0: 00000000 00000000 2d033af8 2d0305c0 00000000 00000000 00000000 7474754e
[CPU0] stack_dump: 0x2d038800: 00000058 00000000 00000000 00000000 00000002 2d033b9c 2d032ea8 2d02fef8
[CPU0] stack_dump: 0x2d038820: 00000000 0d006773 0d006788 21000000 2e323100 00302e30 00000000 00000000
[CPU0] stack_dump: 0x2d038840: 00000000 34320000 63373235 30306131 74634f20 20313120 34323032 3a383020
[CPU0] stack_dump: 0x2d038860: 353a3032 ffff0030 2d02e7e0 2d033b4c 00000000 ffffffea 6d726100 ffffff00
[CPU0] stack_dump: 0x2d038880: 00000000 ffffff92 2d031d84 2d031d74 2d031d70 00000002 2d031d84 2d031d74
[CPU0] stack_dump: 0x2d0388a0: 2d031d70 0000002b 2d031d9c 0000000a 0000002a 0d00aa53 00000000 0d012bb5
[CPU0] stack_dump: 0x2d0388c0: 0000005a 1dcd6500 2d033af8 2d031d6c 2d031d78 0d012e33 ffffffff 00000066
[CPU0] stack_dump: 0x2d0388e0: 0000003a 0000003b 00010066 00000000 00000000 00002000 2d033cc8 00000005
[CPU0] stack_dump: 0x2d038900: 2d031c80 00000000 2d036988 0d02da9b 2d0369b6 00000000 00000000 0d010d2f
[CPU0] stack_dump: 0x2d038920: 0014d040 00000001 0000003f 00147680 000059c0 00147680 00005b28 00000000
[CPU0] stack_dump: 0x2d038940: 00000000 0d010b19 00000005 2d036988 00000000 00000000 00000000 0d00ad77
[CPU0] stack_dump: 0x2d038960: 0d010b19 0d007811 00000000 00000000 00000000 00000000 00000000 00000000

To reproduce the assertion, please apply the following changes.

diff --git a/boards/arm/cxd56xx/spresense/configs/smp/defconfig b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
index 4209329e9e..e171235ea5 100644
--- a/boards/arm/cxd56xx/spresense/configs/smp/defconfig
+++ b/boards/arm/cxd56xx/spresense/configs/smp/defconfig
@@ -28,6 +28,8 @@ CONFIG_CXD56_I2C=y
 CONFIG_CXD56_SPI4=y
 CONFIG_CXD56_SPI5=y
 CONFIG_CXD56_SPI=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_FEATURES=y
 CONFIG_DEBUG_FULLOPT=y
 CONFIG_DEBUG_SYMBOLS=y
 CONFIG_EXAMPLES_HELLO=y
@@ -36,6 +38,7 @@ CONFIG_FS_PROCFS_REGISTER=y
 CONFIG_HAVE_CXX=y
 CONFIG_HAVE_CXXINITIALIZE=y
 CONFIG_INIT_ENTRYPOINT="spresense_main"
+CONFIG_NDEBUG=y
 CONFIG_NSH_ARCHINIT=y
 CONFIG_NSH_BUILTIN_APPS=y
 CONFIG_NSH_READLINE=y
@@ -47,7 +50,7 @@ CONFIG_RR_INTERVAL=200
 CONFIG_RTC=y
 CONFIG_RTC_DRIVER=y
 CONFIG_SMP=y
-CONFIG_SMP_NCPUS=2
+CONFIG_SMP_NCPUS=3
 CONFIG_SPI=y
 CONFIG_START_DAY=3
 CONFIG_START_MONTH=10
@@ -57,5 +60,6 @@ CONFIG_SYSTEM_NSH=y
 CONFIG_SYSTEM_SYSTEM=y
 CONFIG_SYSTEM_TASKSET=y
 CONFIG_TESTING_OSTEST=y
+CONFIG_TESTING_OSTEST_LOOPS=10
 CONFIG_TESTING_SMP=y
 CONFIG_UART1_SERIAL_CONSOLE=y

without any change, will not reproduce problem.

and only have to change CONFIG_SMP_NCPUS=3 able to reproduce problem. make signest_test the first one run ostest will keep reproduce. doing more detail locating.

jasonbu avatar Oct 11 '24 08:10 jasonbu

@masayuki2009 Expected to be completed by next Monday.

GUIDINGLI avatar Oct 12 '24 05:10 GUIDINGLI

@masayuki2009 do you have method or guide how to connect debugger to the spresense board, Jlink or other method, as we cannot re-produce it in sim or qemu environments.

jasonbu avatar Oct 12 '24 12:10 jasonbu

@masayuki2009 hi, hope you see the message. It take us more than expected time to make the re-produce stable, in some cases, log will make scenes disappear. and trying to debug by add critical logs to know what exactly happening inside this case. we are still working on it. If it's urgent. we can share more half progress code to you.

jasonbu avatar Oct 14 '24 17:10 jasonbu

So, this issue can reproduce on Qemu ?

It can not be reproducible on QEMU since armv7-m SMP target is not available on QEMU.

hi @masayuki2009 ,please review if #14363 can fix your problem.

jasonbu avatar Oct 16 '24 10:10 jasonbu

@GUIDINGLI @xiaoxiang781216 hi, this PR broke nrf52832-dk/timer example (nrf52840 also broken). No crash visible, system resets for no reason.

image

raiden00pl avatar Oct 19 '24 16:10 raiden00pl

timer example is also broken on nucleo-g070rb (cortex M0+) so we can assume that more boards are affected. But in this case we have a visible crash:

image

raiden00pl avatar Oct 19 '24 16:10 raiden00pl

@jasonbu could you look at the new issue reported by @raiden00pl ?

xiaoxiang781216 avatar Oct 21 '24 04:10 xiaoxiang781216