flux-core
flux-core copied to clipboard
broker: allow custom topology to be configured
As discussed in #3799, this adds an optional "parent" key to [bootstrap] hosts array entries so that a custom topology can be expressed. Here's an example from a tiny test cluster:
hosts = [
{ host = "picl0" },
{ host = "picl[1-2]", parent = "picl0" },
{ host = "picl[3-5]", parent = "picl1" },
{ host = "picl[6-7]", parent = "picl2" },
]
garlick@picl0:~$ flux overlay status
0 picl0: full
├─ 1 picl1: full
│ ├─ 3 picl3: full
│ ├─ 4 picl4: full
│ └─ 5 picl5: full
└─ 2 picl2: full
├─ 6 picl6: full
└─ 7 picl7: full
A larger cluster that preserves the mapping of ranks to hostname suffixes while using non consecutive nodes as routers as might be done in a system with scalable units:
[[bootstrap.hosts]]
host = "test[0-127]"
[[bootstrap.hosts]]
host = "test[1,64]"
parent = "test0"
[[bootstrap.hosts]]
host = "test[2-63]"
parent = "test1"
[[bootstrap.hosts]]
host = "test[65-127]"
parent = "test64"
Re-pushed with minor changes to error messages and enhanced testing. Also rebased on current master.
Rebased on current master and fixed a small bug. Please hold off reviewing for just a bit - @grondo had a great idea about topology plugins with a URI style scheme. I'd like to make sure this is at least compatible with that concept going forward.
This looks great @garlick. When discussing some of the results with Bronis earlier today he brought up the idea of trying a binomial tree to see if we get better barrier and broadcast etc. performance out of the overlay. I think we can try that with this without any modifications as long as the leaf depth doesn't need to be the same for all leaves.
Just pushed with some refactoring to support a URI for topology creation that can fit with a (future) plugin scheme for topology generation. In this PR there are two "builtin plugins", one called kary and one called custom for the parent directives in the bootstrap.hosts array. Probably we need to allow the topology to be set either on command line or in config.
It should be pretty easy to add code that auto-generates various topologies.
Repushed wtih tbon.fanout replaced wtih tbon.topo, and now it can be configured via TOML as well.
To facilitate transition, I left code in place to make tbon.fanout=K an alias for tbon.topo=kary:K.
This is probably ready for review.
going through the commits now, but wondering at a high level, if a parent is not listed for a node, is the parent assumed to be rank 0?
like what if I did the following (which I stole from your commit message example)
hosts = [
{ host = "test1" },
{ host = "test[0,64,128,192]", parent = "test1" },
{ host = "test[2-63]", parent = "test0" },
{ host = "test[65-127]", parent = "test64" },
{ host = "test[129-191]", parent = "test128" },
{ host = "test[193-255]" }
]
would the parent of test[193-255] be test1? I didn't see a test listed, but might be a good test to add.
side note, i'm only through 2 commits so far, so possible this becomes clear in later commits.
high level comment, I think having the default configuration be "custom" even if the user doesn't specify "parent" might sound a tad confusing? Like if I don't specify a "parent" at all in the config, but the user sees "custom" as the "tbon.topo" config, would that be confusing?
If the user doesn't specify a parent at all, perhaps could have a special default configuration of "flat" (or similar word")?
Just an idea, I'm going back and forth.
if a parent is not listed for a node, is the parent assumed to be rank 0?
Yes - I hope it's clear in flux-config-bootstrap(5):
In a
customtopology, rank 0 is the upstream peer of all other ranks, unlessparentkeys are present in thehostsarray to define a different tree shape.
high level comment, I think having the default configuration be "custom" even if the user doesn't specify "parent" might sound a tad confusing? Like if I don't specify a "parent" at all in the config, but the user sees "custom" as the "tbon.topo" config, would that be confusing?
I waffled on "custom" vs "kary:0" for that case, but ultimately it seemed simpler (both in code and documentation) to say that the default for config file bootstrap is "custom" unless overridden.
Fixed the issues raised so far and forced a push (thanks @chu11!)
I waffled on "custom" vs "kary:0" for that case, but ultimately it seemed simpler (both in code and documentation) to say that the default for config file bootstrap is "custom" unless overridden.
Ahhh it'd be difficult to convey "kary:0" default in some cases but "kary:2" default in other cases. Yeah, pros and cons on each approach. If you're not won over with my "flat" idea, then the current is ok.
Yeah, I think this way is slightly less confusing so I'll leave it as is unless somebody feels more strongly than me :-)
One thing I do not like in retrospect about the code for custom is that I added topology_hosts_set() so that the bootstrap code could make the expanded hosts array available to the custom plugin. Instead, I should add a topology aux container so that arbitrary data can be made available. Since there's already a topology_aux_set() for associating data with ranks, I'll rename that to topology_rank_aux_set() which is clearer anyway. (I think an aux container is probably better in this case than making broker stuff globally available, since this topology class is otherwise standalone. And anyway the expanded hosts array is ephemeral data so it wouldn't fit that model anyway)
Edit: except this data is needed before topology_create() is called so it is global :-( I don't have a great way to resolve this so I think maybe this is the least worst for now. I'll go ahead and push my commit for renaming the existing aux functions since the name was misleading anyhow.
Codecov Report
Merging #4675 (3e68125) into master (62c95e2) will decrease coverage by
0.03%. The diff coverage is82.79%.
@@ Coverage Diff @@
## master #4675 +/- ##
==========================================
- Coverage 83.40% 83.37% -0.04%
==========================================
Files 413 413
Lines 69489 69606 +117
==========================================
+ Hits 57956 58031 +75
- Misses 11533 11575 +42
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/bindings/lua/flux-lua.c | 87.68% <ø> (-0.04%) |
:arrow_down: |
| src/broker/overlay.c | 85.99% <72.72%> (-0.50%) |
:arrow_down: |
| src/broker/boot_config.c | 81.72% <74.00%> (-0.03%) |
:arrow_down: |
| src/broker/boot_pmi.c | 66.93% <87.50%> (-0.68%) |
:arrow_down: |
| src/broker/topology.c | 92.70% <90.42%> (-0.77%) |
:arrow_down: |
| src/connectors/loop/loop.c | 81.96% <100.00%> (-0.30%) |
:arrow_down: |
| src/common/libsubprocess/server.c | 60.54% <0.00%> (-3.79%) |
:arrow_down: |
| src/broker/attr.c | 81.66% <0.00%> (-3.47%) |
:arrow_down: |
| src/cmd/flux-mini.py | 92.35% <0.00%> (-0.25%) |
:arrow_down: |
| ... and 4 more |
Thanks! I think this is probably OK and if we get it merged we do have time to tweak it before the next release, if needed. I'll go ahead and set MWP.