dash Improvements to dash::accumulate

The current implementation of dash::accumulate has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...

Also, dash::accumulate should use dart_accumulate whenever possible.

Discovered while working on #212.

Dec 21 '16 15:12 devreal

The current realization of dash::accumulate is somewhat irritating, yes. Oh yes.

First things first:

dart_accumulate semantically corresponds to MPI_Accumulate: C[i] = reduce(A[i], B[i])
dash::accumulate semantically corresponds to std::accumulate: Acc = reduce([AccInit] v A[s...e])

Consequently, dart_accumulate is not a semantically suitable backend for dash::accumulate. C++ and MPI differ in their definitions of "accumulate" (C++ uses the term correctly, however std::transform should be named std::map, but can't, because of the container. It's a bloody mess).

Semantics of dart_accumulate correspond to dash::transform (~ std::transform), and it is in fact used as its communication backend:

https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Transform.h#L252

The best I could do for this clash of specs was to document it:

https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Accumulate.h#L70
https://github.com/dash-project/dash/blob/development/dash/include/dash/algorithm/Accumulate.h#L118

Standardese aside, the current implementation is broken. I found another defect: the initial value is accumulated in every local partial result, but should only be accumulated once, at unit 0. Currently implemented behavior is:

// (a, b] = { 1,2,3, 4,5,6 }
// --> accumulate(a,b,+,10) = 31

unit 0:                          | unit 1:
=================================+==================================
dash::accumulate(a, b, +, 10);   | dash::accumulate(a, b, +, 10);
--> local result: 10 + 1 + 2 + 3 | --> local result: 10 + 4 + 5 + 6
    = 16                         |     = 25
----------------------------- barrier ------------------------------
result = 16 + 25 = 41 != 31

Dec 21 '16 19:12 fuchsto

Thanks for the enlightenment! Then maybe accumulate_impl should not reside in Accumulate.h since the semantics are different? Who owns the code?

Dec 22 '16 09:12 devreal

Emm. You moved it there? Or @fmoessbauer did and you reviewed it: https://github.com/dash-project/dash/commit/71d57b7895d33736d0bef6b2c8d78a44e60e8a0b

I implemented the algorithm stuff and placed accumulate_impl in Transform.h ... but we all own the code!

Dec 22 '16 09:12 fuchsto

I know. Let me rephrase: Who will fix it? ;)

Dec 22 '16 09:12 devreal

Oh :D I will, my original implementation is broken anyways.

Dec 22 '16 09:12 fuchsto

... aaaand also: as it is defined in DASH, accumulate_impl should be named transform_impl, because that's the DASH semantics.

Dec 22 '16 09:12 fuchsto

Resolved in PR #237

Jan 25 '17 05:01 fuchsto

Reopening this issue after accidentally looking at the code for dash::transform: The way I understand the semantics of dash::transform is that it performs pair-wise operations on two input ranges into one output range. This is done using dart_accumulate, a wrapper around MPI_Accumulate. The latter does exactly what is says: it accumulates a single input range into the single output value. So, this is exactly what dash::accumulate does but is far from the semantics of dash::transform. Am I missing something here?

Feb 02 '17 09:02 devreal

@devreal MPI_Accumulate does not reduce a sequence to a single value, if I get the docs right:

https://www.mpich.org/static/docs/v3.1/www3/MPI_Accumulate.html

It performs an element-wise atomic update operation, similar to a put, but reduces origin and target data into the target buffer. This is why there is a parameter target_count. If the input sequence was reduced to a single value, it would always be 1 and therefore irrelevant.

So that's why dash::accumulate (DASH -> STL semantics) follows std::accumulate and dart_accumulate (DART -> MPI semantics) implements dash::transform / std::transform.

The synopsis described in these lecture slides might be more helpful than the vague definition in the MPI standard docs:

http://wgropp.cs.illinois.edu/courses/cs598-s16/lectures/lecture34.pdf

Feb 03 '17 19:02 fuchsto

Right, got the docs wrong. Sorry for that.

Feb 06 '17 08:02 devreal

I have to reopen this once again because the implementation of dash::accumulate is still broken, which was the actual reason for this issue. Please see my first paragraph in the initial description:

The current implementation of dash::accumulate has a flaw: only unit 0 (randomly chosen?) actually receives a valid result. This is neither documented nor does the user have a chance to influence this. I seem to remember that we had a similar discussion at our last F2F. IMO, either the user can choose which unit receives the result or it should be broadcast to all units. At the very least, it has to be documented...

I guess we got completely side-tracked by the discussion of the semantics of MPI_Accumulate etc but the original problem still persists.

EDIT: as a side-node, there is another problem: the definition of result as auto will lead to really interesting results when used with floating point values...

EDIT2: why not have dash::accumulate get the result to all units in the team and introduce dash::accumulate_at (or an additional unit paramter to dash::accumulate) which accumulates to a single unit. Also, dash::accumulate can make use of dart_allreduce and dart_reduce for basic types.

Feb 13 '17 09:02 devreal

Will be fixed along with introduction of execution- and launch policies, see PR #300

Feb 24 '17 00:02 fuchsto