flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

broker: allow custom topology to be configured

Open garlick opened this issue 3 years ago • 1 comments
trafficstars

As discussed in #3799, this adds an optional "parent" key to [bootstrap] hosts array entries so that a custom topology can be expressed. Here's an example from a tiny test cluster:

hosts = [
    { host = "picl0" },
    { host = "picl[1-2]", parent = "picl0" },
    { host = "picl[3-5]", parent = "picl1" },
    { host = "picl[6-7]", parent = "picl2" },
]
 garlick@picl0:~$ flux overlay status
0 picl0: full
├─ 1 picl1: full
│  ├─ 3 picl3: full
│  ├─ 4 picl4: full
│  └─ 5 picl5: full
└─ 2 picl2: full
   ├─ 6 picl6: full
   └─ 7 picl7: full

A larger cluster that preserves the mapping of ranks to hostname suffixes while using non consecutive nodes as routers as might be done in a system with scalable units:

[[bootstrap.hosts]]
host = "test[0-127]"

[[bootstrap.hosts]]
host = "test[1,64]"
parent = "test0"
[[bootstrap.hosts]]
host = "test[2-63]"
parent = "test1"
[[bootstrap.hosts]]
host = "test[65-127]"
parent = "test64"

garlick avatar Oct 12 '22 17:10 garlick

Re-pushed with minor changes to error messages and enhanced testing. Also rebased on current master.

garlick avatar Oct 17 '22 18:10 garlick

Rebased on current master and fixed a small bug. Please hold off reviewing for just a bit - @grondo had a great idea about topology plugins with a URI style scheme. I'd like to make sure this is at least compatible with that concept going forward.

garlick avatar Oct 19 '22 15:10 garlick

This looks great @garlick. When discussing some of the results with Bronis earlier today he brought up the idea of trying a binomial tree to see if we get better barrier and broadcast etc. performance out of the overlay. I think we can try that with this without any modifications as long as the leaf depth doesn't need to be the same for all leaves.

trws avatar Oct 20 '22 18:10 trws

Just pushed with some refactoring to support a URI for topology creation that can fit with a (future) plugin scheme for topology generation. In this PR there are two "builtin plugins", one called kary and one called custom for the parent directives in the bootstrap.hosts array. Probably we need to allow the topology to be set either on command line or in config.

It should be pretty easy to add code that auto-generates various topologies.

garlick avatar Oct 20 '22 19:10 garlick

Repushed wtih tbon.fanout replaced wtih tbon.topo, and now it can be configured via TOML as well. To facilitate transition, I left code in place to make tbon.fanout=K an alias for tbon.topo=kary:K.

This is probably ready for review.

garlick avatar Oct 21 '22 16:10 garlick

going through the commits now, but wondering at a high level, if a parent is not listed for a node, is the parent assumed to be rank 0?

like what if I did the following (which I stole from your commit message example)

hosts = [
    { host = "test1" },
    { host = "test[0,64,128,192]", parent = "test1" },
    { host = "test[2-63]",    parent = "test0" },
    { host = "test[65-127]",  parent = "test64" },
    { host = "test[129-191]", parent = "test128" },
    { host = "test[193-255]" }
]

would the parent of test[193-255] be test1? I didn't see a test listed, but might be a good test to add.

side note, i'm only through 2 commits so far, so possible this becomes clear in later commits.

chu11 avatar Oct 21 '22 20:10 chu11

high level comment, I think having the default configuration be "custom" even if the user doesn't specify "parent" might sound a tad confusing? Like if I don't specify a "parent" at all in the config, but the user sees "custom" as the "tbon.topo" config, would that be confusing?

If the user doesn't specify a parent at all, perhaps could have a special default configuration of "flat" (or similar word")?

Just an idea, I'm going back and forth.

chu11 avatar Oct 22 '22 21:10 chu11

if a parent is not listed for a node, is the parent assumed to be rank 0?

Yes - I hope it's clear in flux-config-bootstrap(5):

In a custom topology, rank 0 is the upstream peer of all other ranks, unless parent keys are present in the hosts array to define a different tree shape.

garlick avatar Oct 22 '22 21:10 garlick

high level comment, I think having the default configuration be "custom" even if the user doesn't specify "parent" might sound a tad confusing? Like if I don't specify a "parent" at all in the config, but the user sees "custom" as the "tbon.topo" config, would that be confusing?

I waffled on "custom" vs "kary:0" for that case, but ultimately it seemed simpler (both in code and documentation) to say that the default for config file bootstrap is "custom" unless overridden.

garlick avatar Oct 22 '22 22:10 garlick

Fixed the issues raised so far and forced a push (thanks @chu11!)

garlick avatar Oct 22 '22 22:10 garlick

I waffled on "custom" vs "kary:0" for that case, but ultimately it seemed simpler (both in code and documentation) to say that the default for config file bootstrap is "custom" unless overridden.

Ahhh it'd be difficult to convey "kary:0" default in some cases but "kary:2" default in other cases. Yeah, pros and cons on each approach. If you're not won over with my "flat" idea, then the current is ok.

chu11 avatar Oct 23 '22 03:10 chu11

Yeah, I think this way is slightly less confusing so I'll leave it as is unless somebody feels more strongly than me :-)

One thing I do not like in retrospect about the code for custom is that I added topology_hosts_set() so that the bootstrap code could make the expanded hosts array available to the custom plugin. Instead, I should add a topology aux container so that arbitrary data can be made available. Since there's already a topology_aux_set() for associating data with ranks, I'll rename that to topology_rank_aux_set() which is clearer anyway. (I think an aux container is probably better in this case than making broker stuff globally available, since this topology class is otherwise standalone. And anyway the expanded hosts array is ephemeral data so it wouldn't fit that model anyway)

Edit: except this data is needed before topology_create() is called so it is global :-( I don't have a great way to resolve this so I think maybe this is the least worst for now. I'll go ahead and push my commit for renaming the existing aux functions since the name was misleading anyhow.

garlick avatar Oct 23 '22 17:10 garlick

Codecov Report

Merging #4675 (3e68125) into master (62c95e2) will decrease coverage by 0.03%. The diff coverage is 82.79%.

@@            Coverage Diff             @@
##           master    #4675      +/-   ##
==========================================
- Coverage   83.40%   83.37%   -0.04%     
==========================================
  Files         413      413              
  Lines       69489    69606     +117     
==========================================
+ Hits        57956    58031      +75     
- Misses      11533    11575      +42     
Impacted Files Coverage Δ
src/bindings/lua/flux-lua.c 87.68% <ø> (-0.04%) :arrow_down:
src/broker/overlay.c 85.99% <72.72%> (-0.50%) :arrow_down:
src/broker/boot_config.c 81.72% <74.00%> (-0.03%) :arrow_down:
src/broker/boot_pmi.c 66.93% <87.50%> (-0.68%) :arrow_down:
src/broker/topology.c 92.70% <90.42%> (-0.77%) :arrow_down:
src/connectors/loop/loop.c 81.96% <100.00%> (-0.30%) :arrow_down:
src/common/libsubprocess/server.c 60.54% <0.00%> (-3.79%) :arrow_down:
src/broker/attr.c 81.66% <0.00%> (-3.47%) :arrow_down:
src/cmd/flux-mini.py 92.35% <0.00%> (-0.25%) :arrow_down:
... and 4 more

codecov[bot] avatar Oct 23 '22 19:10 codecov[bot]

Thanks! I think this is probably OK and if we get it merged we do have time to tweak it before the next release, if needed. I'll go ahead and set MWP.

garlick avatar Oct 23 '22 23:10 garlick