flux-sched icon indicating copy to clipboard operation
flux-sched copied to clipboard

fatal job exception raised on pending jobs when reloading Fluxion modules

Open grondo opened this issue 8 months ago • 1 comments

While reloading fluxion on elcap, several pending jobs were canceled with a fatal job exception such as:

[Jun04 14:42] exception type="alloc" severity=0 note="alloc denied due to type=\"match error\"" userid=765
[  +0.000608] clean

For reference, here's the logs at the time of module reload:

[Jun04 14:42] broker[0]: rmmod sched-fluxion-resource
[ +14.008927] sched-fluxion-resource[0]: responding to post-shutdown sched-fluxion-resource.cancel
[ +14.009019] broker[0]: module sched-fluxion-resource exited
[ +14.012128] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.014486] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.015532] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.045507] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.087013] broker[0]: rmmod resource
[ +14.087290] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.103970] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.104489] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.104968] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.105501] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.105973] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.106463] sched-fluxion-qmanager[0]: check_watcher_cb: run_sched_loop: Function not implemented
[ +14.122417] sched-fluxion-qmanager[0]: responding to post-shutdown sched.ping
[ +14.122435] sched-fluxion-qmanager[0]: responding to post-shutdown sched.ping
[ +14.122442] sched-fluxion-qmanager[0]: responding to post-shutdown sched.ping
[ +14.122447] sched-fluxion-qmanager[0]: responding to post-shutdown sched.ping
[ +14.122451] sched-fluxion-qmanager[0]: responding to post-shutdown sched.ping
[ +14.122456] sched-fluxion-qmanager[0]: responding to post-shutdown sched.ping
[ +14.122461] sched-fluxion-qmanager[0]: responding to post-shutdown sched.ping
[ +14.122465] sched-fluxion-qmanager[0]: responding to post-shutdown sched.disconnect
[ +14.122469] sched-fluxion-qmanager[0]: responding to post-shutdown sched-fluxion-qmanager.ping
[ +14.122474] sched-fluxion-qmanager[0]: responding to post-shutdown sched-fluxion-qmanager.ping
[ +14.122479] sched-fluxion-qmanager[0]: responding to post-shutdown sched-fluxion-qmanager.ping
[ +14.122483] sched-fluxion-qmanager[0]: responding to post-shutdown sched-fluxion-qmanager.ping
[ +14.122488] sched-fluxion-qmanager[0]: responding to post-shutdown sched-fluxion-qmanager.ping
[ +14.122492] sched-fluxion-qmanager[0]: responding to post-shutdown sched-fluxion-qmanager.disconnect
[ +14.122496] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122500] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122505] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122510] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122514] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122518] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122529] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122534] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122538] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122543] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122546] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122550] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122554] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122558] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122563] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122580] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122585] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122590] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122594] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122599] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122603] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122608] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122612] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122635] sched-fluxion-qmanager[0]: responding to post-shutdown sched.cancel
[ +14.122639] sched-fluxion-qmanager[0]: responding to post-shutdown sched.cancel
[ +14.122642] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122648] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.122652] sched-fluxion-qmanager[0]: responding to post-shutdown sched.free
[ +14.139690] broker[0]: module sched-fluxion-qmanager exited
[ +14.139745] job-manager[0]: alloc: stop due to disconnect: Success

grondo avatar Jun 04 '24 22:06 grondo