nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

sched: Introduce Bound Multi-Processing (BMP) into NuttX

Open anchao opened this issue 1 year ago • 13 comments

Summary

sched: Introduce Bound Multi-Processing (BMP) into NuttX

Bound multiprocessing provides the scheduling control of an asymmetric multiprocessing model, while preserving the hardware abstraction and management of symmetric multiprocessing.

BMP is similar to SMP, but you can specify which processors a thread can run on. You can use both SMP and BMP on the same system, allowing some threads to migrate from one processor to another, while other threads are restricted to one or more processors.

As with SMP, a single copy of the OS maintains an overall view of all system resources, allowing them to be dynamically allocated and shared among applications. But, during application initialization, a setting determined by the system designer forces all of an application's threads to execute only on a specified CPU.

Compared to full, floating SMP operation, this approach offers several advantages:

  1. It eliminates the cache thrashing that can reduce performance in an SMP system by allowing applications that share the same data set to run exclusively on the same CPU.
  2. It offers simpler application debugging than SMP since all execution threads within an application run on a single CPU.
  3. It helps legacy applications that use poor techniques for synchronizing shared data to run correctly, again by letting them run on a single CPU.

Bound Multi-Processing (BMP):

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |  <- Application bound to CPU
---------------------------------------------
|  Data[0] |  Data[1] |  Data[2] |  Data[3] |  <- NuttX Kernel Data supports multiple CPU instances
---------------------------------------------
|                Share Code                 |  <- NuttX kernel code shared for all CPUs
---------------------------------------------
|   UART 0 |   SPI 0  |   SPI 1  |   I2C 0  |  <- Driver is only registered to CPUs with application needs
---------------------------------------------
|  TIME 0  |  TIME 1  |  TIME 2  |  TIME 3  |  <- Core/CPU timers
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |  <- CPUs run independently
---------------------------------------------

Some subsystem data does not need to be duplicated, especially the components bound to the application. For shared hardware devices, Use spinlock to avoid race-condition for multi-core.

---------------------------------------------
|   APP 0  |  APP 1   |  APP 2   |  APP 3   |
---------------------------------------------
| NetStack |  BTStack |  AUDIO   |   ...    |  <- Components bound to the application, data no need to duplicate.
---------------------------------------------
|                Share Code                 |
---------------------------------------------
|      Share UART (Protected by Spinlock)   |  <- Driver shared for all CPUS will protected by spinlock(e.g print logs)
---------------------------------------------
|   CPU0   |   CPU1   |   CPU2   |   CPU3   |
---------------------------------------------

Reference: https://www.ghs.com/products/safety_critical/integrity_178_multicore.html https://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.sys_arch/topic/smp_BMP.html https://www.nxp.com.cn/docs/en/brochure/PWRARBYNDBITSRAS.pdf

Signed-off-by: chao an [email protected]

Impact

Depends on: https://github.com/apache/nuttx-apps/pull/2342

N/A

Testing

qemu-armv7a/bmp ostest on single core

nuttx$ qemu-system-arm -cpu cortex-a7 -nographic      -machine virt,virtualization=off,gic-version=2 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx -smp 4 

NuttShell (N
NuttS
Nutt
NH) NutShellShetSutlltX- (NSH) NuttX-10.4.0
4t(NheSH) NuttX-ns0.h> 1ll (NSH) Nu0.4.0
nstX-1h> 
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000536  13.1%  CPU0 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f51c 0x4013f530
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main
nsh> irqaff 33 1
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU1 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f544 0x4013f558
    2     2 100 RR       Task      - Running            0000000000000000 004056 001288  31.7%  nsh_main
nsh> irqaff 33 2
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU2 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f56c 0x4013f580
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main
nsh> irqaff 33 3
nsh> ps
  PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK           STACK   USED  FILLED COMMAND
    0     0   0 FIFO     Kthread   - Ready              0000000000000000 004080 000736  18.0%  CPU3 IDLE
    1     1 192 RR       Kthread   - Waiting  Semaphore 0000000000000000 004032 000296   7.3%  hpwork 0x4013f594 0x4013f5a8
    2     2 100 RR       Task      - Running            0000000000000000 004056 001168  28.7%  nsh_main

anchao avatar Mar 28 '24 11:03 anchao

@anchao could you split irqaff change to a new pr? So the change crossing apps/nuttx could be merged first. Since the remaining change touch many files, it's better to ensure it can pass ci standalone.

xiaoxiang781216 avatar Mar 29 '24 05:03 xiaoxiang781216

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

anjiahao1 avatar Apr 02 '24 04:04 anjiahao1

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.

anchao avatar Apr 02 '24 07:04 anchao

Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?

Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.

Great!

anjiahao1 avatar Apr 02 '24 07:04 anjiahao1

Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?

PetervdPerk-NXP avatar Apr 02 '24 21:04 PetervdPerk-NXP

Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?

Yes, but the implementation requires more customized modifications. If platforms without hardware cache consistency, all data must be correctly placed on cache line aligned sections, which will depend on some labeling for specific data/bss in the link script.

anchao avatar Apr 03 '24 00:04 anchao

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?

zouboan avatar Apr 06 '24 03:04 zouboan

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?

Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.

anchao avatar Apr 09 '24 01:04 anchao

What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?

Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.

i see,it seem's that BMP can resolve two problems of SMP processor affinity: constraining threads in third-party code, and constraining dynamically created threads

zouboan avatar Apr 09 '24 12:04 zouboan

@anchao please fix these conflicts and I will merge it

acassis avatar Jun 22 '24 22:06 acassis

@anchao please fix these conflicts and I will merge it

@acassis current implementation may impact the distribution of NuttX data resources. I need evaluate whether there is more flexible way to implement this feature further, @xiaoxiang781216 please let me know If any suggestions

anchao avatar Jun 24 '24 02:06 anchao

@anchao please fix these conflicts and I will merge it

@acassis current implementation may impact the distribution of NuttX data resources. I need evaluate whether there is more flexible way to implement this feature further, @xiaoxiang781216 please let me know If any suggestions

do you still need patch all global variables?

xiaoxiang781216 avatar Jun 25 '24 11:06 xiaoxiang781216

@anchao please include Documentation/ about BMP

@acassis I will provide documentation later, this PR still needs further enhancement, thanks for review.

anchao avatar Aug 30 '24 06:08 anchao