Optimize atomic copies
Paul (@pauldreik) has come up with system-specific atomic-safe copy functions (https://github.com/simdutf/simdutf/pull/767). It seems to be based on empirical work by Erik Rigtorp https://rigtorp.se/isatomic/ for x64 systems. There is a caveat: even if the operations are indeed atomic, we still need our users (e.g., Google v8) to agree that it is indeed atomic. The x64 isa is quite varied and it would be nice to have an authoritative reference.
We are currently working with faster 64-bit copies, that should work well portably. It has decent speeds.
AFAIR both Intel and AMD guarantee that on X64, when aligned, 64b moves are always atomic and with AVX aligned, VEX-encoded, 16B moves are atomic too. Outside of that nothing more is guaranteed to my knowledge.
Indeed, we could use 16B aligned moves on x64 systems.
Indeed, we could use 16B aligned moves on x64 systems.
On ARM64, IIRC LSE2 guarantees that aligned 16B LDP/STP are atomic too. Without that they are guaranteed to be 2x8B IIRC so still half atomic. Not sure how that plays with the ARM64 memory model being weak though.
Reference
https://ibraheem.ca/posts/128-bit-atomics/