dash
dash copied to clipboard
Add support for merging teams in DASH
Currently a team split is a non reversible operation. Together with the specification (or current implementation) that each unit can only be part of a single team hierarchy, team splits cannot be tested in the unit tests. Hence I set the bug label.
To fix this I see the following possibilities:
- allow one unit in multiple teams (requires that the team hierarchy is not a single tree)
- provide a merge or join operation
As far as I see, DART 3.2 already provides an interface therefor.
@fmoessbauer Which functionality would you like to test exactly? The team split is not reversible in the sense that the created team can not be destroyed explicitly, but on the other hand the original team also doesn't go away and can still be used.
I feel that team related operations can actually be very tricky to reason about and therefore I'm hesitant to include an operation such as a join or merge unless we come up with an actual use-case in the context of an application. As long as that compelling use case for merge/join does not exist I'm in favor of a usage model that is as restricted as possible. Currently that restricted model is: The only way to create a new team is to start with an existing team and build a subset.
As for multiple team hierarchies - yes, I think that is a viable model with cloned teams. Although this also requires some brainstorming about the consequences too...
Currently each Team / sub team can only be split once. However in the unit tests we have multiple tests which split the dash::Team::All()
. As all unit tests are executed in one DASH run, the ordering of the tests matters.The second testcase that splits a team will always fail.
However each test case should not depend on the env created by previous tests. The problem can easily be verified by running the following tests isolated.
- Array*
- TeamTest* Each testgroup itself passes, but if run together the following exception is thrown:
[ 0 ERROR ] [ 7065 ] Team.cc :62 | dash::exception::InvalidArgument | [ Unit 0 ] child of team 0 already set to 1, cannot set to 3
Due to this behavior, the CI of PR #135 fails.
Here is an implementation sketch:
Implement a method clone()
that clones the current team, i.e., creates a copy of a team on the C++ side for the same underlying DART team.
- add a boolean variable
_is_clone
to differentiate the original from the clone(s). - actual de-allocation of DART resources should only happen in the destructor of originals. Cloned teams leave the DART resources alone.
- implement a method
clone()
that clones the current team:- if the current team has a parent, call
clone()
on the parent first, let's call the new cloned parenttpar
. - create a new team
t
, set theis_clone
flag to true - set
tpar
's child pointer tot
(if thetpar
exists) - set
t
's parent pointer to thetpar
(if thetpar
exists) - in the originals we might then want to keep track of created clones
- if the current team has a parent, call
- update the
split()
function:- when calling
split()
on a team that already has a child team, callclone()
first and then invokesplit()
on newly created clone.
- when calling
The recommended way of using teams in the unit tests (and elsewhere) would then be :
Team &t = dash::Team::All().clone();
team &split = t.split(2);
Also, from issue #140: All teams should be destroyed in dart_finalize
.
Is this going anywhere?