ucc icon indicating copy to clipboard operation
ucc copied to clipboard

OOB Allgather details

Open vspetrov opened this issue 4 years ago • 4 comments

I've got several questions regarding OOB Allgather:

  1. Do we assume that the ordering of the buffers in the destination is fixed when the oob.allgather is called several times. For example, i can imagine (though it is very weird) the implementation of the allgather where all ranks send their data to the root, which receives the buffers in ANY order and packs them one after another. This implementation will produce different results from one call to another. Do we allow this?
  2. Suppose user calls ucc_team_create_post providing EP value as input + EP_RANGE_CONTIG. In other words user specifies the "rank" of the calling process in the team. Does this "rank" has to match the ordering in the oob.allgather. Ie, if i will invoke oob.allgather with the uniq process identifier as input and then parse the result buffer, then i will find the local id at the position corresponding to "rank" - is that correct?

vspetrov avatar Jan 20 '21 16:01 vspetrov

I guess the answers to the questions above are:

  1. No, we don't allow that. Ie, all calls to allgather should be exactly consistent (deterministic)
  2. Yes, the rank provided for team creation (or context creation) as input EP parameter MUST match the logic of OOB allgather, ie the data in the result buffer of the allgather will be placed according to the provided rank.

@manjugv am i correctly understanding?

vspetrov avatar Feb 05 '21 19:02 vspetrov

Did we converge on this ? If not, let's discuss this in the WG tomorrow.

manjugv avatar Feb 16 '21 23:02 manjugv

I have a question related to this. What is a lifetime of OOB allgather? Is it correct that user should guarantee OOB available until team is destroyed?

Sergei-Lebedev avatar Feb 17 '21 05:02 Sergei-Lebedev

per discussion: OOB must exist until team is destroyed. @manjugv is this reflected in DOC explicitely?

vspetrov avatar May 11 '21 19:05 vspetrov