chapel
chapel copied to clipboard
Support atomic min/max
Our atomic support is largely based on C11/C++11, so which operations we support is also largely the same. The main current exception is that we also support operations on real's, but C++20 added support for that too. I think there'd be value in adding support for an atomic min and max as well. We have a few tests that manually implement atomic max (see 9689096677884fcf35b56ba7c22ec5172c65039b) and we have support for min/max reductions, which could be optimized by doing the cross task reductions with atomics.
For other motivation, there's hardware support in some HPC networks and most communication libraries support them (including gasnet, ugni and libfabric.) Rust also supports atomic min/max and there's a proposal to add support in C++: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p0493r3.pdf.
It's not obvious to me if we'd want to implement min/max as read-conditionally-modify operations or a read-modify-write operations. The C++ proposal above uses RMW to match other operators, and I'd default to following that and optionally doing a RCM if users opt-in to relaxed memory order or something. I think in practice min/max won't be highly contended operations so there probably won't be much of a performance difference (RMW means all writers need exclusive cache line access)
I'd be in favor of this, and it also seems thematically related to https://github.com/chapel-lang/chapel/issues/7629 (which I also continue to be in favor of). Both seem to take the theme of elevating min/max to more of an equal sibling to other operators.
I found a case where this would be particularly useful. I have some code that's trying to compute min/max/average for some data & average can be handled with atomic adds (for the number of items and the total, to divide later). But min/max are nowhere to be seen for atomics.
Also in thinking about this (before coming across this issue) I found https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p0493r5.pdf which is a newer version of the article Elliot linked to above. It includes an example implementation which they've benchmarked and shown to be about as fast as an ARMv8 atomic-max instruction.
template <typename T>
T atomic_fetch_max_explicit(atomic<T>* pv,
typename atomic<T>::value_type v,
memory_order m) noexcept {
auto const mr = drop_release(m);
auto t = (mr != m) ? pv->fetch_add(0, m) : pv->load(mr);
while (max(v, t) != t) {
if (pv->compare_exchange_weak(t, v, m, mr))
return t;
}
return t;
}
It's not obvious to me if we'd want to implement min/max as read-conditionally-modify operations or a read-modify-write operations.
This newer C++ proposal is always RMW if the memory order is sequentially consistent, release, or acq_rel ordering, but doesn't fret about it for relaxed or acquire ordering. (That is, assuming drop_release does what I think it does. I can't find documentation or implementation of that, but I have a pretty high certainty that the fetch_add(0) is required for memory_order_seq_cst)
I've created PR #25900 with an initial attempt at this.
@jhh67 / @mppf: Can this be closed now that #25900 is merged?
Do we want to add min/max network atomic support to ugni? I assume not, in which case this can be closed.
I agree that's the main question. I think we should close this issue, and if we decide we want ugni support, we make a new issue. (It's unclear to me if we need ugni support for it, so I'm not going to immediately make such an issue).
I'll close this issue now.
I'd consider ugni sufficiently close to end-of-life to not be a reason to keep it open for that. Just for my own reference (since Engin pointed out that Piz Daint is ugni and has at least one Chapel-interested user on it), what is the behavior if someone attempts to use atomic min/max on ugni (using the default network atomics setting?)
They will get an error message of the form "fetchMin is not supported by CHPL_NETWORK_ATOMICS=ugni".
Perfect, thanks!