ucc
ucc copied to clipboard
CL/HIER: check global team status
What
CL HIER should report global status on team create.
Why ?
It's possible that selection table may be different on different ranks if rank considers local status only. Internal issue: https://redmine.mellanox.com/issues/3336577
How ?
Do service team allreduce at the end of CL HIER team create to know global team status.
Probably it is worth moving the team lvl allreduce logic to the core: in the ucc_team_create_test in the very end (after all CLs are created). So, that it will always be just 1 "status exchange allreduce" in the end of the team creation. CLs statusus would be part of it. If at some point we will add more info to exchange (synchronize) upon team creation we will piggy-back it there as well. Currently, for example, maybe CL/BASIC also needs to synch which TLs are created. Then both CLs could do it in just 1 allreduce.
makes sense?