Improvements to dash::accumulate
The current implementation of dash::accumulate has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...
Also, dash::accumulate should use dart_accumulate whenever possible.
Discovered while working on #212.
The current realization of dash::accumulate is somewhat irritating, yes. Oh yes.
First things first:
dart_accumulatesemantically corresponds toMPI_Accumulate:C[i] = reduce(A[i], B[i])dash::accumulatesemantically corresponds tostd::accumulate:Acc = reduce([AccInit] v A[s...e])
Consequently, dart_accumulate is not a semantically suitable backend for dash::accumulate.
C++ and MPI differ in their definitions of "accumulate" (C++ uses the term correctly, however std::transform should be named std::map, but can't, because of the container. It's a bloody mess).
Semantics of dart_accumulate correspond to dash::transform (~ std::transform), and it is in fact used as its communication backend:
https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Transform.h#L252
The best I could do for this clash of specs was to document it:
- https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Accumulate.h#L70
- https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Accumulate.h#L118
Standardese aside, the current implementation is broken. I found another defect: the initial value is accumulated in every local partial result, but should only be accumulated once, at unit 0. Currently implemented behavior is:
// (a, b] = { 1,2,3, 4,5,6 }
// --> accumulate(a,b,+,10) = 31
unit 0: | unit 1:
=================================+==================================
dash::accumulate(a, b, +, 10); | dash::accumulate(a, b, +, 10);
--> local result: 10 + 1 + 2 + 3 | --> local result: 10 + 4 + 5 + 6
= 16 | = 25
----------------------------- barrier ------------------------------
result = 16 + 25 = 41 != 31
Thanks for the enlightenment! Then maybe accumulate_impl should not reside in Accumulate.h since the semantics are different? Who owns the code?
Emm. You moved it there? Or @fmoessbauer did and you reviewed it: https://github.com/dash-project/dash/commit/71d57b7895d33736d0bef6b2c8d78a44e60e8a0b
I implemented the algorithm stuff and placed accumulate_impl in Transform.h
... but we all own the code!
I know. Let me rephrase: Who will fix it? ;)
Oh :D I will, my original implementation is broken anyways.
... aaaand also: as it is defined in DASH, accumulate_impl should be named transform_impl, because that's the DASH semantics.
Resolved in PR #237
Reopening this issue after accidentally looking at the code for dash::transform: The way I understand the semantics of dash::transform is that it performs pair-wise operations on two input ranges into one output range. This is done using dart_accumulate, a wrapper around MPI_Accumulate. The latter does exactly what is says: it accumulates a single input range into the single output value. So, this is exactly what dash::accumulate does but is far from the semantics of dash::transform. Am I missing something here?
@devreal
MPI_Accumulate does not reduce a sequence to a single value, if I get the docs right:
https://www.mpich.org/static/docs/v3.1/www3/MPI_Accumulate.html
It performs an element-wise atomic update operation, similar to a put, but reduces origin and target data into the target buffer. This is why there is a parameter target_count. If the input sequence was reduced to a single value, it would always be 1 and therefore irrelevant.
So that's why dash::accumulate (DASH -> STL semantics) follows std::accumulate and dart_accumulate (DART -> MPI semantics) implements dash::transform / std::transform.
The synopsis described in these lecture slides might be more helpful than the vague definition in the MPI standard docs:
http://wgropp.cs.illinois.edu/courses/cs598-s16/lectures/lecture34.pdf
Right, got the docs wrong. Sorry for that.
I have to reopen this once again because the implementation of dash::accumulate is still broken, which was the actual reason for this issue. Please see my first paragraph in the initial description:
The current implementation of dash::accumulate has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...
I guess we got completely side-tracked by the discussion of the semantics of MPI_Accumulate etc but the original problem still persists.
EDIT: as a side-node, there is another problem: the definition of result as auto will lead to really interesting results when used with floating point values...
EDIT2: why not have dash::accumulate get the result to all units in the team and introduce dash::accumulate_at (or an additional unit paramter to dash::accumulate) which accumulates to a single unit. Also, dash::accumulate can make use of dart_allreduce and dart_reduce for basic types.
Will be fixed along with introduction of execution- and launch policies, see PR #300