flux-core
flux-core copied to clipboard
system instance: need ability to map TBON topology to cluster topology
Problem: a system instance may benefit from configuring "router nodes" (interior tree nodes) to be the service nodes within a scalable unit, but the current configuration which maps a k-ary tree of broker ranks to a flat hostlist does not directly support that.
Design a way to express the desired mapping in configuration.
Allow the TBON levels to have differing numbers of descendants instead of one k for the entire tree.
An example is we might want to have a primary management node as rank 0, the RPS nodes for each scalable unit as the first tree level, other service nodes within the scalable unit as a second tree level, and compute nodes as leaves.
This is probably going to be needed sooner rather than later so attaching to next release milestone.
Dropping from the next release milestone since this is now tracked in a feature tracker, and we likely won't need it for a few months given current rollout plans which have us on sub-128 node systems through at least September 2022.
Just jotting down one idea about how to represent this in the TOML config in a convenient and compact way: an optional parent key in each hosts array entry.
Example: 256 node cluster consisting of 64 node scalable units and three level TBON with router mapped to first node in each unit, and "test1" (arbitrarily) designated as the management node:
hosts = [
{ host = "test1" },
{ host = "test[0,64,128,192]", parent = "test1" },
{ host = "test[2-63]", parent = "test0" },
{ host = "test[65-127]", parent = "test64" },
{ host = "test[129-191]", parent = "test128" },
{ host = "test[193-255]", parent = "test192" }
]
It would be an all or nothing thing - if parent is specified at all, the tree would need to be fully specified. Otherwise topology would be generated as a function of fanout with ranks in breadth first order as they are now.