x64: `WAITPKG`-based spinlocks
The waitpkg cpuid flag indicates the availability of some user-space instructions for monitoring an address region and waiting for a specific memory-region to be written to, in particular umonitor and umwait. These instructions are found in tremont, sapphire rapids, and alderlake.
https://www.felixcloutier.com/x86/umonitor
https://www.felixcloutier.com/x86/umwait
I proposed the additions for xbyak here: https://github.com/herumi/xbyak/issues/143
These instructions can possibly be used to accelerate spinlocks.
waitpkg instructions have been added to xbyak as of https://github.com/herumi/xbyak/commit/898c354e67313b194efe3a66e0f502ed4dac35ed
I've made a draft of this, but I do not currently have the required hardware available to me at the moment to validate that the implementation works or that it is faster than pause. Based on the descriptions though, it should be faster and more optimal in the case of hyper-threading cores and systems with many cores.