sched: Introduce Bound Multi-Processing (BMP) into NuttX
Summary
sched: Introduce Bound Multi-Processing (BMP) into NuttX
Bound multiprocessing provides the scheduling control of an asymmetric multiprocessing model, while preserving the hardware abstraction and management of symmetric multiprocessing.
BMP is similar to SMP, but you can specify which processors a thread can run on. You can use both SMP and BMP on the same system, allowing some threads to migrate from one processor to another, while other threads are restricted to one or more processors.
As with SMP, a single copy of the OS maintains an overall view of all system resources, allowing them to be dynamically allocated and shared among applications. But, during application initialization, a setting determined by the system designer forces all of an application's threads to execute only on a specified CPU.
Compared to full, floating SMP operation, this approach offers several advantages:
- It eliminates the cache thrashing that can reduce performance in an SMP system by allowing applications that share the same data set to run exclusively on the same CPU.
- It offers simpler application debugging than SMP since all execution threads within an application run on a single CPU.
- It helps legacy applications that use poor techniques for synchronizing shared data to run correctly, again by letting them run on a single CPU.
Bound Multi-Processing (BMP):
---------------------------------------------
| APP 0 | APP 1 | APP 2 | APP 3 | <- Application bound to CPU
---------------------------------------------
| Data[0] | Data[1] | Data[2] | Data[3] | <- NuttX Kernel Data supports multiple CPU instances
---------------------------------------------
| Share Code | <- NuttX kernel code shared for all CPUs
---------------------------------------------
| UART 0 | SPI 0 | SPI 1 | I2C 0 | <- Driver is only registered to CPUs with application needs
---------------------------------------------
| TIME 0 | TIME 1 | TIME 2 | TIME 3 | <- Core/CPU timers
---------------------------------------------
| CPU0 | CPU1 | CPU2 | CPU3 | <- CPUs run independently
---------------------------------------------
Some subsystem data does not need to be duplicated, especially the components bound to the application. For shared hardware devices, Use spinlock to avoid race-condition for multi-core.
---------------------------------------------
| APP 0 | APP 1 | APP 2 | APP 3 |
---------------------------------------------
| NetStack | BTStack | AUDIO | ... | <- Components bound to the application, data no need to duplicate.
---------------------------------------------
| Share Code |
---------------------------------------------
| Share UART (Protected by Spinlock) | <- Driver shared for all CPUS will protected by spinlock(e.g print logs)
---------------------------------------------
| CPU0 | CPU1 | CPU2 | CPU3 |
---------------------------------------------
Reference: https://www.ghs.com/products/safety_critical/integrity_178_multicore.html https://www.qnx.com/developers/docs/7.1/#com.qnx.doc.neutrino.sys_arch/topic/smp_BMP.html https://www.nxp.com.cn/docs/en/brochure/PWRARBYNDBITSRAS.pdf
Signed-off-by: chao an [email protected]
Impact
Depends on: https://github.com/apache/nuttx-apps/pull/2342
N/A
Testing
qemu-armv7a/bmp ostest on single core
nuttx$ qemu-system-arm -cpu cortex-a7 -nographic -machine virt,virtualization=off,gic-version=2 -net none -chardev stdio,id=con,mux=on -serial chardev:con -mon chardev=con,mode=readline -kernel ./nuttx -smp 4
NuttShell (N
NuttS
Nutt
NH) NutShellShetSutlltX- (NSH) NuttX-10.4.0
4t(NheSH) NuttX-ns0.h> 1ll (NSH) Nu0.4.0
nstX-1h>
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 004080 000536 13.1% CPU0 IDLE
1 1 192 RR Kthread - Waiting Semaphore 0000000000000000 004032 000296 7.3% hpwork 0x4013f51c 0x4013f530
2 2 100 RR Task - Running 0000000000000000 004056 001168 28.7% nsh_main
nsh> irqaff 33 1
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 004080 000736 18.0% CPU1 IDLE
1 1 192 RR Kthread - Waiting Semaphore 0000000000000000 004032 000296 7.3% hpwork 0x4013f544 0x4013f558
2 2 100 RR Task - Running 0000000000000000 004056 001288 31.7% nsh_main
nsh> irqaff 33 2
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 004080 000736 18.0% CPU2 IDLE
1 1 192 RR Kthread - Waiting Semaphore 0000000000000000 004032 000296 7.3% hpwork 0x4013f56c 0x4013f580
2 2 100 RR Task - Running 0000000000000000 004056 001168 28.7% nsh_main
nsh> irqaff 33 3
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 004080 000736 18.0% CPU3 IDLE
1 1 192 RR Kthread - Waiting Semaphore 0000000000000000 004032 000296 7.3% hpwork 0x4013f594 0x4013f5a8
2 2 100 RR Task - Running 0000000000000000 004056 001168 28.7% nsh_main
@anchao could you split irqaff change to a new pr? So the change crossing apps/nuttx could be merged first. Since the remaining change touch many files, it's better to ensure it can pass ci standalone.
Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?
Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?
Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.
Can BMP ensure that if an application bound to a separate processor crashes, it will not affect other processors?
Of course, this is just the initial pull request of BMP. MPU protection and assertion chain related optimization will be added in the future.
Great!
Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?
Cool work, I'm curious would this work on a asymmetrical system witch it's own caches i.e. Cortex-M7 and Cortex-M4 but without hardware cache coherency?
Yes, but the implementation requires more customized modifications. If platforms without hardware cache consistency, all data must be correctly placed on cache line aligned sections, which will depend on some labeling for specific data/bss in the link script.
What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?
What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?
Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.
What is the difference between this approach and the pthread_setaffinity_np functions implemented by NuttX?Does threads spawned by a task bound to a specific processor can also be automatically bound to that processor with this approach?
Please refer PR summary. Compared with SMP, BMP can provide more performance, stability and isolation advantages.
i see,it seem's that BMP can resolve two problems of SMP processor affinity: constraining threads in third-party code, and constraining dynamically created threads
@anchao please fix these conflicts and I will merge it
@anchao please fix these conflicts and I will merge it
@acassis current implementation may impact the distribution of NuttX data resources. I need evaluate whether there is more flexible way to implement this feature further, @xiaoxiang781216 please let me know If any suggestions
@anchao please fix these conflicts and I will merge it
@acassis current implementation may impact the distribution of NuttX data resources. I need evaluate whether there is more flexible way to implement this feature further, @xiaoxiang781216 please let me know If any suggestions
do you still need patch all global variables?
@anchao please include Documentation/ about BMP
@acassis I will provide documentation later, this PR still needs further enhancement, thanks for review.