dash
dash copied to clipboard
Improvements to dash::accumulate
The current implementation of dash::accumulate
has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...
Also, dash::accumulate
should use dart_accumulate
whenever possible.
Discovered while working on #212.
The current realization of dash::accumulate
is somewhat irritating, yes. Oh yes.
First things first:
-
dart_accumulate
semantically corresponds toMPI_Accumulate
:C[i] = reduce(A[i], B[i])
-
dash::accumulate
semantically corresponds tostd::accumulate
:Acc = reduce([AccInit] v A[s...e])
Consequently, dart_accumulate
is not a semantically suitable backend for dash::accumulate
.
C++ and MPI differ in their definitions of "accumulate" (C++ uses the term correctly, however std::transform
should be named std::map
, but can't, because of the container. It's a bloody mess).
Semantics of dart_accumulate
correspond to dash::transform
(~ std::transform
), and it is in fact used as its communication backend:
https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Transform.h#L252
The best I could do for this clash of specs was to document it:
- https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Accumulate.h#L70
- https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Accumulate.h#L118
Standardese aside, the current implementation is broken. I found another defect: the initial value is accumulated in every local partial result, but should only be accumulated once, at unit 0. Currently implemented behavior is:
// (a, b] = { 1,2,3, 4,5,6 }
// --> accumulate(a,b,+,10) = 31
unit 0: | unit 1:
=================================+==================================
dash::accumulate(a, b, +, 10); | dash::accumulate(a, b, +, 10);
--> local result: 10 + 1 + 2 + 3 | --> local result: 10 + 4 + 5 + 6
= 16 | = 25
----------------------------- barrier ------------------------------
result = 16 + 25 = 41 != 31
Thanks for the enlightenment! Then maybe accumulate_impl
should not reside in Accumulate.h
since the semantics are different? Who owns the code?
Emm. You moved it there? Or @fmoessbauer did and you reviewed it: https://github.com/dash-project/dash/commit/71d57b7895d33736d0bef6b2c8d78a44e60e8a0b
I implemented the algorithm stuff and placed accumulate_impl
in Transform.h
... but we all own the code!
I know. Let me rephrase: Who will fix it? ;)
Oh :D I will, my original implementation is broken anyways.
... aaaand also: as it is defined in DASH, accumulate_impl
should be named transform_impl
, because that's the DASH semantics.
Resolved in PR #237
Reopening this issue after accidentally looking at the code for dash::transform
: The way I understand the semantics of dash::transform
is that it performs pair-wise operations on two input ranges into one output range. This is done using dart_accumulate
, a wrapper around MPI_Accumulate
. The latter does exactly what is says: it accumulates a single input range into the single output value. So, this is exactly what dash::accumulate
does but is far from the semantics of dash::transform
. Am I missing something here?
@devreal
MPI_Accumulate
does not reduce a sequence to a single value, if I get the docs right:
https://www.mpich.org/static/docs/v3.1/www3/MPI_Accumulate.html
It performs an element-wise atomic update operation, similar to a put, but reduces origin and target data into the target buffer. This is why there is a parameter target_count
. If the input sequence was reduced to a single value, it would always be 1 and therefore irrelevant.
So that's why dash::accumulate
(DASH -> STL semantics) follows std::accumulate
and dart_accumulate
(DART -> MPI semantics) implements dash::transform
/ std::transform
.
The synopsis described in these lecture slides might be more helpful than the vague definition in the MPI standard docs:
http://wgropp.cs.illinois.edu/courses/cs598-s16/lectures/lecture34.pdf
Right, got the docs wrong. Sorry for that.
I have to reopen this once again because the implementation of dash::accumulate is still broken, which was the actual reason for this issue. Please see my first paragraph in the initial description:
The current implementation of dash::accumulate has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...
I guess we got completely side-tracked by the discussion of the semantics of MPI_Accumulate
etc but the original problem still persists.
EDIT: as a side-node, there is another problem: the definition of result
as auto
will lead to really interesting results when used with floating point values...
EDIT2: why not have dash::accumulate
get the result to all units in the team and introduce dash::accumulate_at
(or an additional unit
paramter to dash::accumulate
) which accumulates to a single unit. Also, dash::accumulate
can make use of dart_allreduce
and dart_reduce
for basic types.
Will be fixed along with introduction of execution- and launch policies, see PR #300