ucc icon indicating copy to clipboard operation
ucc copied to clipboard

CL/HIER: check global team status

Open Sergei-Lebedev opened this issue 2 years ago • 1 comments

What

CL HIER should report global status on team create.

Why ?

It's possible that selection table may be different on different ranks if rank considers local status only. Internal issue: https://redmine.mellanox.com/issues/3336577

How ?

Do service team allreduce at the end of CL HIER team create to know global team status.

Sergei-Lebedev avatar Jan 24 '23 10:01 Sergei-Lebedev

Probably it is worth moving the team lvl allreduce logic to the core: in the ucc_team_create_test in the very end (after all CLs are created). So, that it will always be just 1 "status exchange allreduce" in the end of the team creation. CLs statusus would be part of it. If at some point we will add more info to exchange (synchronize) upon team creation we will piggy-back it there as well. Currently, for example, maybe CL/BASIC also needs to synch which TLs are created. Then both CLs could do it in just 1 allreduce.

makes sense?

vspetrov avatar Jan 25 '23 13:01 vspetrov