spinlock: Better recursive spinlock implementation.

Open Fix-Point opened this issue 6 months ago • 1 comments

Summary

This commit provided a better recursive spinlock implementation with less memory-overhead and better performance.

E.g. Memory-overhead: 32 + 32(padding) + 64 + 32 = 160 bits decrease to 32 bits on AArch64. Performance: 160-bits data may cross the cache-line, result in significant performance degradation. 32-bits lock-data will not cross the cache-line in any case.

Impact

This commit may have an impact on memory usage and performance on SMP systems.

Testing

Tested on qemu-armv8a:nsh_smp.

Tested on qemu-intel64:nsh using sudo qemu-system-x86_64 -enable-kvm -cpu host,+invtsc,+vmware-cpuid-freq,kvmclock=off -m 2G -kernel nuttx -nographic -serial mon:stdio (kvm). The simple 1M lock-and-add loop test result on a Intel Core 12700 machine shows the latency per op of under no contention:

spinlock	rspinlock(without this commit)	rspinlock
19ns	54ns	37ns

The throughput of the rspinlock implementation is 1.45x better than the previous implementation.

Jun 19 '25 03:06 Fix-Point

@Fix-Point nice! Please include some testing on real machine as well (QEMU tests sometimes will not catch some issues). If you can please include some performance tests, maybe the benchmarks on NuttX will prove this solution is better then the original

Jun 19 '25 14:06 acassis