flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

broker: need more useful progress indication when starting a large instance

Open garlick opened this issue 1 year ago • 2 comments

Problem: need better feedback to users when brokers are slow to start up in a big instance (like a large flux alloc).

If not all the brokers enter the PMI barrier, there is no feedback. To reproduce, run

$ flux start -s 64 --test-start-mode=leader
[wait forever]
^Cflux-broker: simple: barrier: operation failed
flux-broker: bootstrap failed

If a node completes PMI bootstrap but then fails to wire up, messages like this appear every 5s

$ flux start -s 64 -o,-Sbroker.quorum-timeout=10s
Apr 10 19:33:12.898135 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:22.899014 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:32.900086 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:42.901166 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:33:52.901839 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:02.902736 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:12.903074 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:22.903295 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:32.903458 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:42.903722 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:34:52.904452 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:35:02.905143 broker.err[0]: quorum delayed: waiting for system76-pc (rank 63)
Apr 10 19:35:03.278572 broker.err[0]: quorum reached

To reproduce that I added the following patch

diff --git a/src/broker/broker.c b/src/broker/broker.c
index 971b48732..53515fc54 100644
--- a/src/broker/broker.c
+++ b/src/broker/broker.c
@@ -404,6 +404,9 @@ int main (int argc, char *argv[])
                   flux_reactor_now (ctx.reactor) - ctx.starttime);
     }
 
+    if (ctx.rank == 63)
+        sleep (120);
+
     // Setup profiling
     setup_profiling (argv[0], ctx.rank);
 

garlick avatar Apr 11 '24 02:04 garlick

When this came up before I was playing with a broker --progress option (on my broker_progress branch which has one commit, 36ff1e2a4285636641294b85d46c6b1091d4ba7c).

It prints:

flux-broker: waiting for remaining brokers to join: 63 of 64

with the numbers rewritten in place. That works in addition to the "quorum delayed" message described above which call out the missing hostnames every 5s.

I stalled out on this before because I wasn't really sure how to integrate this into the overall system.

garlick avatar Apr 11 '24 02:04 garlick

One idea is to expose the quorum progress via an RPC, then responsibility for indicating progress can be handled by flux job attach (or any other tool that is interested). The tool can open a handle to the instance as soon as the uri attribute is posted to the eventlog and monitor progress. This is currently how flux alloc --bg works, but it just monitors state-machine.wait

grondo avatar Apr 11 '24 15:04 grondo