physicsnemo icon indicating copy to clipboard operation
physicsnemo copied to clipboard

🐛[BUG]: Order of communicators in comm tree is not in the order as they were added

Open azrael417 opened this issue 2 years ago • 1 comments

Version

0.5.0

On which installation method(s) does this occur?

Pip

Describe the issue

I have the following problem: although I am adding process groups in a specific order to the tree, the printed tree, when converted to dict, has the wrong order. The ordering is important because the user wants to influence of how its communicators are laid out, ideally in a fashion so that ranks in communicators instantiated first are closer together.

Minimum reproducible example

from modulus.distributed.manager import DistributedManager
from modulus.distributed.config import ProcessGroupNode, ProcessGroupConfig

leaf_config = {"h": 2, "w": 2, "fin": 1, "fout": 1}

world = ProcessGroupNode('world')
pconfig = ProcessGroupConfig(world)

pconfig.add_node(ProcessGroupNode("data"), parent="world")

pconfig.add_node(ProcessGroupNode("model"), parent="world")

pconfig.add_node(ProcessGroupNode("spatial"), parent="model")
pconfig.add_node(ProcessGroupNode("matmul"), parent="model")

pconfig.add_node(ProcessGroupNode("h", size=leaf_config["h"]), parent="spatial")
pconfig.add_node(ProcessGroupNode("w", size=leaf_config["w"]), parent="spatial")

pconfig.add_node(ProcessGroupNode("fin", size=leaf_config["fin"]), parent="matmul")
pconfig.add_node(ProcessGroupNode("fout", size=leaf_config["fout"]), parent="matmul")

print(pconfig.tree.to_dict())

Relevant log output

This prints:

>>> print(pconfig.tree.to_dict())
{'world': {'children': ['data', {'model': {'children': [{'matmul': {'children': ['fin', 'fout']}}, {'spatial': {'children': ['h', 'w']}}]}}]}}

So although the group "spatial" was added first to its parent "model", the order in the tree view is swapped. I am not 100% sure if that will be the same order in which those communicators will be instantiated but it would be good to verify that nodes which are added first are guaranteed to be instantiated first. The children are stored in a list which has strict ordering. Therefore, it should be easy to fix this.

Environment details

I think this is not relevant for this issue.

azrael417 avatar Nov 30 '23 07:11 azrael417

According to the documentation, the nodes in treelib should be ordered as they are appended. So if that is the case, I think we should be good.

azrael417 avatar Nov 30 '23 11:11 azrael417