flux-core
flux-core copied to clipboard
future fulfilled with unspecified error when broker exits
When a process connected to a broker is waiting for a future to be fulfilled and the broker exits, flux_future_get()
returns an error (-1), but the errno is either not set or invalid, resulting in confusing error messages.
Here's an example with flux-dmesg(1)
:
$ src/cmd/flux start -s 4 --test-exit-mode=leader --test-pmi-clique=per-broker -o -Stbon.topo=kary:0 bash -c '(FLUX_URI=$(flux exec -r3 flux getattr local-uri) flux dmesg -HLnf &) && sleep 1 && flux overlay disconnect 3 && wait'
flux-overlay: asking corona212 (rank 0) to disconnect child corona212 (rank 3)
Mar 21 08:21:33.904967 broker.err[0]: corona212 (rank 3) transitioning to LOST due to administrative action
Mar 21 08:21:33.905168 broker.crit[3]: corona212 (rank 0) sent disconnect control message
[Mar21 08:21] broker[3]: corona212 (rank 0) sent disconnect control message
[ +0.000170] broker[3]: shutdown: run->cleanup 1.03355s
[ +0.000205] broker[3]: cleanup-none: cleanup->shutdown 0.02316ms
[ +0.000238] broker[3]: children-none: shutdown->finalize 0.023781ms
[ +0.001014] broker[3]: state-machine.monitor: No route to host
[ +0.084695] broker[3]: rmmod resource
[ +0.085065] broker[3]: module resource exited
[ +0.136105] broker[3]: rmmod job-info
[ +0.136219] broker[3]: module job-info exited
[ +0.185526] broker[3]: rmmod job-ingest
[ +0.185691] broker[3]: module job-ingest exited
[ +0.331674] broker[3]: rmmod barrier
[ +0.331807] broker[3]: module barrier exited
[ +0.380347] broker[3]: rmmod kvs-watch
[ +0.380474] broker[3]: module kvs-watch exited
[ +0.431345] broker[3]: rmmod kvs
[ +0.431458] broker[3]: module kvs exited
[ +0.543153] broker[3]: rmmod content
[ +0.543301] broker[3]: module content exited
[ +0.544237] broker[3]: rc3.0: /g/g0/grondo/git/f.git/etc/rc3 Exited (rc=0) 0.5s
[ +0.544330] broker[3]: rc3-success: finalize->goodbye 0.544083s
[ +0.544404] broker[3]: goodbye: goodbye->exit 0.066921ms
flux-dmesg: log.dmesg: Success
flux-start: 3 (pid 796783) exited with rc=1
Possibly an ECONNRESET
or other more useful errno is being overwritten somewhere in the error handling here.