Travis Whitaker
Travis Whitaker
Turns out I mistakenly used an atomic write to release the distribution spinlock, when a weak write is sufficient (and much faster). Now the primops implementation is comparable to the...
One way or another, #41 needs to be fixed for EKG to work well on newer ARM boards. If there's no interest in this patch, we should find another way...
@23Skidoo What more can I do to help get this package fixed on aarch64?
The `Atomic` values that we operate on with these C functions are in a `ForeignPtr` https://github.com/tibbe/ekg-core/blob/f8d26b9a3806125694c25b9459ec0c4a0b2e87ce/Data/Atomic.hs#L22 Since `IORef` is just a `MutVar#`, using these C functions might even be _slower_...
Yikes, for some reason I thought that a MutVar's contents would be unboxed if the contained type was unboxed, but it seems that's not the case: https://gitlab.haskell.org/ghc/ghc/-/blob/master/includes/rts/storage/Closures.h#L181 `MutableByteArray#` has the...
@tibbe is there any interest in fixing this bug (via #42 or otherwise)? As-is, ekg is totally broken on aarch64. CC @23Skidoo
I’d definitely like to get more eyes on #42 before it lands in a release. I’ve been using it since I opened the PR, but my use cases might not...
I was able to test the same kernel on a Mac with an Intel m3, so the wacky i9 hypothesis is out. Seems it is indeed MacOS-related.
It would also be valuable to include types to represent the additional reduced-precision quantities that WMMA supports, such as short floats and four bit ints. It may be difficult to...
Draft proposal: --- title: Support NVPTX WMMA in Accelerate --- [Accelerate](http://www.acceleratehs.org/) is a DSL for parallel array-based computations. A unique feature of the language is runtime compilation, allowing Haskell to...