[REQ] Atomic functions for Tiles
Description
Expose atomic functions for tiles, as in CUDA C++, which allow atomic operations on shared memory.
Context
Currently, only atomic functions on arrays are exposed in Warp. However, it's often useful to perform atomic operations on tiles (as in CUDA C++ shared memory) within a tile (or block/CTA) first, and then write to global memory afterwards.
~~In addition, operators such as += and -= are currently atomic when operating on arrays stored in global memory, but are not atomic when operating on tiles, which may cause some confusion. This can also be fixed in this feature request.~~
(Update: I Just noticed that in-place operators are actually atomic)
The current way to perform atomic operations on tiles seem to require falling back to CUDA C++ shared memory: https://github.com/NVIDIA/warp/discussions/298
I would be happy to open a PR for this, and will appreciate some guidance.
Although atomic_* functions for tiles are not yet available, equivalent functionality can now be achieved using the in-place operators +=, -=, &=, |=, and ^=, on tiles. This covers most use cases.
Support for atomic in-place bitwise operators on tiles was added in PR https://github.com/NVIDIA/warp/pull/887
The in-place atomic operators are sufficient for my use case. However, I'll keep this issue open in case future use cases require explicit atomic_* functions for tiles, or if additional atomics (CAS, EXCH, MAX, MIN, etc.) become necessary. Please share relevant use cases in this thread. Thanks!
Related: https://github.com/NVIDIA/warp/issues/886#issuecomment-3145326127