dash icon indicating copy to clipboard operation
dash copied to clipboard

Add support for merging teams in DASH

Open fmoessbauer opened this issue 8 years ago • 5 comments

Currently a team split is a non reversible operation. Together with the specification (or current implementation) that each unit can only be part of a single team hierarchy, team splits cannot be tested in the unit tests. Hence I set the bug label.

To fix this I see the following possibilities:

  • allow one unit in multiple teams (requires that the team hierarchy is not a single tree)
  • provide a merge or join operation

As far as I see, DART 3.2 already provides an interface therefor.

fmoessbauer avatar Nov 20 '16 13:11 fmoessbauer

@fmoessbauer Which functionality would you like to test exactly? The team split is not reversible in the sense that the created team can not be destroyed explicitly, but on the other hand the original team also doesn't go away and can still be used.

I feel that team related operations can actually be very tricky to reason about and therefore I'm hesitant to include an operation such as a join or merge unless we come up with an actual use-case in the context of an application. As long as that compelling use case for merge/join does not exist I'm in favor of a usage model that is as restricted as possible. Currently that restricted model is: The only way to create a new team is to start with an existing team and build a subset.

As for multiple team hierarchies - yes, I think that is a viable model with cloned teams. Although this also requires some brainstorming about the consequences too...

fuerlinger avatar Nov 20 '16 14:11 fuerlinger

Currently each Team / sub team can only be split once. However in the unit tests we have multiple tests which split the dash::Team::All(). As all unit tests are executed in one DASH run, the ordering of the tests matters.The second testcase that splits a team will always fail.

However each test case should not depend on the env created by previous tests. The problem can easily be verified by running the following tests isolated.

  • Array*
  • TeamTest* Each testgroup itself passes, but if run together the following exception is thrown:
[    0 ERROR ] [  7065 ] Team.cc                  :62   | dash::exception::InvalidArgument             | [ Unit 0 ] child of team 0 already set to 1, cannot set to 3

Due to this behavior, the CI of PR #135 fails.

fmoessbauer avatar Nov 20 '16 14:11 fmoessbauer

Here is an implementation sketch:

Implement a method clone() that clones the current team, i.e., creates a copy of a team on the C++ side for the same underlying DART team.

  • add a boolean variable _is_clone to differentiate the original from the clone(s).
  • actual de-allocation of DART resources should only happen in the destructor of originals. Cloned teams leave the DART resources alone.
  • implement a method clone() that clones the current team:
    • if the current team has a parent, call clone() on the parent first, let's call the new cloned parent tpar.
    • create a new team t, set the is_clone flag to true
    • set tpar's child pointer to t (if the tpar exists)
    • set t's parent pointer to the tpar (if the tpar exists)
    • in the originals we might then want to keep track of created clones
  • update the split() function:
    • when calling split() on a team that already has a child team, call clone() first and then invoke split() on newly created clone.

The recommended way of using teams in the unit tests (and elsewhere) would then be :

Team &t = dash::Team::All().clone();
team &split = t.split(2);

fuerlinger avatar Nov 22 '16 09:11 fuerlinger

Also, from issue #140: All teams should be destroyed in dart_finalize.

fuchsto avatar Nov 23 '16 14:11 fuchsto

Is this going anywhere?

devreal avatar May 02 '19 06:05 devreal