Atomic C functions are not atomic.
hs_atomic_read and hs_atomic_write aren't atomic at all on machines without strong memory models (total read and store ordering). To see an example of the kind of havoc this bad assumption can cause, see this (thankfully now fixed) GHC issue https://gitlab.haskell.org/ghc/ghc/-/issues/15449
Here's the aarch64 code that GCC 8.2 yields for these functions: https://godbolt.org/z/giVdtx No barriers or acq/rel lda/sta emitted...
I'm curious why C functions are used for these atomic operations at all. Are these functions really that much faster than atomicModifyIORef?
The Atomic values that we operate on with these C functions are in a ForeignPtr https://github.com/tibbe/ekg-core/blob/f8d26b9a3806125694c25b9459ec0c4a0b2e87ce/Data/Atomic.hs#L22
Since IORef is just a MutVar#, using these C functions might even be slower than an IORef.
I'd be happy to take a crack at moving over to IORef, but perhaps there's something I'm missing (or I'm standing on the grave of some really tricky bug...).
IIRC I measured it and it was much faster. In particular a MutVar# update implies allocation for the heap value it refers to.
Yikes, for some reason I thought that a MutVar's contents would be unboxed if the contained type was unboxed, but it seems that's not the case: https://gitlab.haskell.org/ghc/ghc/-/blob/master/includes/rts/storage/Closures.h#L181
MutableByteArray# has the required atomic operations, though. So we could sprinkle __sync__synchronize() around, or give MutableByteArray# a spin.
@tibbe is there any interest in fixing this bug (via #42 or otherwise)? As-is, ekg is totally broken on aarch64.
CC @23Skidoo
I'll look into cutting new releases during the weekend.
I’d definitely like to get more eyes on #42 before it lands in a release. I’ve been using it since I opened the PR, but my use cases might not hit certain issues.