modulus
modulus copied to clipboard
🚀[FEA]: Print communicator layout when using verbose=True in DistributedManager.create_groups_from_config
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Critical (currently preventing usage)
Please provide a clear description of problem you would like to solve.
I am looking into the distributed manager. When creating a tree of communicators, it is possible to use a verbose flag to print the layouts. What it prints is something like this:
Node ID: world
Children: [Node(tag=model, identifier=model, data=ProcessGroupNode(name=model, size=8, ), Node(tag=data, identifier=data, data=ProcessGroupNode(name=data, size=4, )]
Node ID: model
Children: [Node(tag=spatial, identifier=spatial, data=ProcessGroupNode(name=spatial, size=4, ), Node(tag=matmul, identifier=matmul, data=ProcessGroupNode(name=matmul, size=2, )]
Node ID: data
Children: []
Node ID: spatial
Children: [Node(tag=h, identifier=h, data=ProcessGroupNode(name=h, size=4, ), Node(tag=w, identifier=w, data=ProcessGroupNode(name=w, size=1, )]
Node ID: matmul
Children: [Node(tag=fin, identifier=fin, data=ProcessGroupNode(name=fin, size=2, ), Node(tag=fout, identifier=fout, data=ProcessGroupNode(name=fout, size=1, )]
Node ID: h
Children: []
Node ID: w
Children: []
Node ID: fin
Children: []
Node ID: fout
Children: []
Describe any alternatives you have considered
I think what would be good is to not print an empty list for leaf nodes. This is more or less cosmetic. However, what is very useful is to print a list of world ranks associated with every node: for example, if I have 8 ranks, and I have 2 model and 4 data parallel ranks, it would be good to see something like this:
world: [0, 1, ,2 ,3 ,4, 5, 6, 7] model = [ [0, 1, 2, 3], [4, 5, 6, 7] ] data = [ [0, 4], [1, 5], [2, 6], [3, 7] ]
This is very instructive to understand how ranks are placed and helps debugging. Especially when you are using say alltoall in one comm direction and all reductions in the other, you want to make sure that the alltoall ranks are placed closer together than the allreduce ranks. printing this topology helps to understand the comm layout better.