spinlock: Better recursive spinlock implementation.
Summary
This commit provided a better recursive spinlock implementation with less memory-overhead and better performance.
E.g. Memory-overhead: 32 + 32(padding) + 64 + 32 = 160 bits decrease to 32 bits on AArch64. Performance: 160-bits data may cross the cache-line, result in significant performance degradation. 32-bits lock-data will not cross the cache-line in any case.
Impact
This commit may have an impact on memory usage and performance on SMP systems.
Testing
Tested on qemu-armv8a:nsh_smp.
Tested on qemu-intel64:nsh using sudo qemu-system-x86_64 -enable-kvm -cpu host,+invtsc,+vmware-cpuid-freq,kvmclock=off -m 2G -kernel nuttx -nographic -serial mon:stdio (kvm).
The simple 1M lock-and-add loop test result on a Intel Core 12700 machine shows the latency per op of under no contention:
| spinlock | rspinlock(without this commit) | rspinlock |
|---|---|---|
| 19ns | 54ns | 37ns |
The throughput of the rspinlock implementation is 1.45x better than the previous implementation.
@Fix-Point nice! Please include some testing on real machine as well (QEMU tests sometimes will not catch some issues). If you can please include some performance tests, maybe the benchmarks on NuttX will prove this solution is better then the original