oneDPL
oneDPL copied to clipboard
Numeric PSTL algos require copy-constructible
All of reduce, transform_reduce, exclusive_scan, and inclusive_scan, transform_exclusive_scan, and transform_inclusive_scan only have a precondition on the type of init that it meets the Cpp17MoveConstructible requirements, so when passing it to the next internal function it needs to be moved, not copied:
return transform_inclusive_scan(std::forward<_ExecutionPolicy>(__exec), __first, __last, __result, __binary_op,
- __pstl::__internal::__no_op(), __init);
+ __pstl::__internal::__no_op(), std::move(__init));
And they need to move when creating local variables:
- _Tp __temp = *__first;
+ _Tp __temp = std::move(*__first);
and when returning init as well:
- return std::make_pair(__result, __init);
+ return std::make_pair(__result, std::move(__init));
https://github.com/oneapi-src/oneDPL/blame/b132b83cddec63df5b3794aba8c154cb186d568e/include/oneapi/dpl/pstl/numeric_impl.h#L176-L182
Also reported as https://github.com/llvm/llvm-project/issues/118539 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117905
@jwakely Thanks for reporting this, and apologies for the delay in our response, it slipped through the cracks.
We will take a look at this and try to get a fix in to address it.
@jwakely I'm looking at this in the context of oneDPL, but first trying to get a full handle on the requirements for the serial implementations, before considering the implications with the parallel version and the specifics of the different backends.
I've noticed that libcxx and libstdc++ (including your patch) also seem to rely upon Cpp17MoveAssignable from the init type. I don't see that provided by the specification, is there something I'm missing there that guarantees the move assignment is available?
Aside: It looks like your patch continues to use std::inner_product for transform_reduce which seems to require CopyConstructible and CopyAssignable from the init type. You likely ran into this and are aware since the transform_reduce test is empty for now.
https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0571r2.html proposes that the type of the intermediate value (init) needs to be both Cpp17MoveConstructible and Cpp17MoveAssignable.
Although that paper seems to be stuck in limbo (https://github.com/cplusplus/papers/issues/546) I think its analysis is correct, and there's no reasonable way to implement these algos without those requirements.
That paper also proposes changing inner_product to require movable instead of copyable.
Aha! This is exactly what I was looking for. Thanks for pointing this paper out.
I want to do some more investigation about if the extra requirements proposed in the paper would be sufficient for performant parallel implementations for our existing backends.
Also, since it is not currently in the standard and may not even make it into C++26, we will need to consider how to approach this (and with what priority).
Our plan:
As mentioned in #2327, our intention is to add require only move constructible and move assignable for these types when using host policies / backends. This PR starts this effort for reduce, with more to follow. This should align with libstdc++.
For device policies, we must also require sycl device-copyable for these types (which can be satisfied with move-only types, as long as move constructor and move assignment operator are equivalent to bitwise copy). For the short term, we will add a known limitation for device policies to require copy constructible and copy assignable for these types, until it is addressed.