chapel icon indicating copy to clipboard operation
chapel copied to clipboard

Support atomic min/max

Open ronawho opened this issue 2 years ago • 3 comments

Our atomic support is largely based on C11/C++11, so which operations we support is also largely the same. The main current exception is that we also support operations on real's, but C++20 added support for that too. I think there'd be value in adding support for an atomic min and max as well. We have a few tests that manually implement atomic max (see 9689096677884fcf35b56ba7c22ec5172c65039b) and we have support for min/max reductions, which could be optimized by doing the cross task reductions with atomics.

For other motivation, there's hardware support in some HPC networks and most communication libraries support them (including gasnet, ugni and libfabric.) Rust also supports atomic min/max and there's a proposal to add support in C++: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p0493r3.pdf.

It's not obvious to me if we'd want to implement min/max as read-conditionally-modify operations or a read-modify-write operations. The C++ proposal above uses RMW to match other operators, and I'd default to following that and optionally doing a RCM if users opt-in to relaxed memory order or something. I think in practice min/max won't be highly contended operations so there probably won't be much of a performance difference (RMW means all writers need exclusive cache line access)

ronawho avatar Aug 03 '23 17:08 ronawho

I'd be in favor of this, and it also seems thematically related to https://github.com/chapel-lang/chapel/issues/7629 (which I also continue to be in favor of). Both seem to take the theme of elevating min/max to more of an equal sibling to other operators.

bradcray avatar Aug 08 '23 21:08 bradcray

I found a case where this would be particularly useful. I have some code that's trying to compute min/max/average for some data & average can be handled with atomic adds (for the number of items and the total, to divide later). But min/max are nowhere to be seen for atomics.

Also in thinking about this (before coming across this issue) I found https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p0493r5.pdf which is a newer version of the article Elliot linked to above. It includes an example implementation which they've benchmarked and shown to be about as fast as an ARMv8 atomic-max instruction.

template <typename T>
T atomic_fetch_max_explicit(atomic<T>* pv,
                            typename atomic<T>::value_type v,
                            memory_order m) noexcept {
  auto const mr = drop_release(m);
  auto t = (mr != m) ? pv->fetch_add(0, m) : pv->load(mr);
  while (max(v, t) != t) {
    if (pv->compare_exchange_weak(t, v, m, mr))
      return t;
  }
  return t;
}

It's not obvious to me if we'd want to implement min/max as read-conditionally-modify operations or a read-modify-write operations.

This newer C++ proposal is always RMW if the memory order is sequentially consistent, release, or acq_rel ordering, but doesn't fret about it for relaxed or acquire ordering. (That is, assuming drop_release does what I think it does. I can't find documentation or implementation of that, but I have a pretty high certainty that the fetch_add(0) is required for memory_order_seq_cst)

mppf avatar Aug 15 '24 12:08 mppf

I've created PR #25900 with an initial attempt at this.

mppf avatar Sep 06 '24 19:09 mppf

@jhh67 / @mppf: Can this be closed now that #25900 is merged?

bradcray avatar Dec 06 '24 04:12 bradcray

Do we want to add min/max network atomic support to ugni? I assume not, in which case this can be closed.

jhh67 avatar Dec 06 '24 13:12 jhh67

I agree that's the main question. I think we should close this issue, and if we decide we want ugni support, we make a new issue. (It's unclear to me if we need ugni support for it, so I'm not going to immediately make such an issue).

I'll close this issue now.

mppf avatar Dec 06 '24 13:12 mppf

I'd consider ugni sufficiently close to end-of-life to not be a reason to keep it open for that. Just for my own reference (since Engin pointed out that Piz Daint is ugni and has at least one Chapel-interested user on it), what is the behavior if someone attempts to use atomic min/max on ugni (using the default network atomics setting?)

bradcray avatar Dec 06 '24 17:12 bradcray

They will get an error message of the form "fetchMin is not supported by CHPL_NETWORK_ATOMICS=ugni".

jhh67 avatar Dec 06 '24 22:12 jhh67

Perfect, thanks!

bradcray avatar Dec 07 '24 00:12 bradcray