flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

future fulfilled with unspecified error when broker exits

Open grondo opened this issue 11 months ago • 1 comments

When a process connected to a broker is waiting for a future to be fulfilled and the broker exits, flux_future_get() returns an error (-1), but the errno is either not set or invalid, resulting in confusing error messages.

Here's an example with flux-dmesg(1):

$ src/cmd/flux start -s 4 --test-exit-mode=leader --test-pmi-clique=per-broker -o -Stbon.topo=kary:0 bash -c '(FLUX_URI=$(flux exec -r3 flux getattr local-uri) flux dmesg -HLnf &) && sleep 1 && flux overlay disconnect 3 && wait'
flux-overlay: asking corona212 (rank 0) to disconnect child corona212 (rank 3)
Mar 21 08:21:33.904967 broker.err[0]: corona212 (rank 3) transitioning to LOST due to administrative action
Mar 21 08:21:33.905168 broker.crit[3]: corona212 (rank 0) sent disconnect control message
[Mar21 08:21] broker[3]: corona212 (rank 0) sent disconnect control message
[  +0.000170] broker[3]: shutdown: run->cleanup 1.03355s
[  +0.000205] broker[3]: cleanup-none: cleanup->shutdown 0.02316ms
[  +0.000238] broker[3]: children-none: shutdown->finalize 0.023781ms
[  +0.001014] broker[3]: state-machine.monitor: No route to host
[  +0.084695] broker[3]: rmmod resource
[  +0.085065] broker[3]: module resource exited
[  +0.136105] broker[3]: rmmod job-info
[  +0.136219] broker[3]: module job-info exited
[  +0.185526] broker[3]: rmmod job-ingest
[  +0.185691] broker[3]: module job-ingest exited
[  +0.331674] broker[3]: rmmod barrier
[  +0.331807] broker[3]: module barrier exited
[  +0.380347] broker[3]: rmmod kvs-watch
[  +0.380474] broker[3]: module kvs-watch exited
[  +0.431345] broker[3]: rmmod kvs
[  +0.431458] broker[3]: module kvs exited
[  +0.543153] broker[3]: rmmod content
[  +0.543301] broker[3]: module content exited
[  +0.544237] broker[3]: rc3.0: /g/g0/grondo/git/f.git/etc/rc3 Exited (rc=0) 0.5s
[  +0.544330] broker[3]: rc3-success: finalize->goodbye 0.544083s
[  +0.544404] broker[3]: goodbye: goodbye->exit 0.066921ms
flux-dmesg: log.dmesg: Success
flux-start: 3 (pid 796783) exited with rc=1

Possibly an ECONNRESET or other more useful errno is being overwritten somewhere in the error handling here.

grondo avatar Mar 21 '24 15:03 grondo