Mark Grondona
Mark Grondona
While reloading fluxion on elcap, several pending jobs were canceled with a fatal job exception such as: ``` [Jun04 14:42] exception type="alloc" severity=0 note="alloc denied due to type=\"match error\"" userid=765...
I was testing reconfiguration without a restart via `resource` and Fluxion module reload and ran into one issue. Here's a test that somewhat randomly redistributes resources among queues while jobs...
I was able to create a reproduce of some kind of fluxion performance problem via the following sharness test, which creates 16K fake resources split into 3 different sets via...
Flux supports opt-in configuration reload for modules. When `flux config reload` is used, the broker re-reads configuration and sends a `.config-reload` RPC to all modules. The new config can be...
It appears that in c3ef9a2f242b36808da92e2ef613bb80299e6f4f some of the autotools setup for the Spindle Flux plugin was removed. Unfortunately, this breaks proper (or perhaps more correctly _convenient) installation of the Flux...
A use case came up recently where it would be useful to be able to determine if a job has explicitly or implicitly obtained exclusive access to resources on the...
On tuolumne, we're seeing sets of drained nodes with 'unkillable processes' even though there are no processes running when admins investigate after the fact. In one instance, a job was...
Problem: The job-exec module drains nodes with what it considered "unkillable" processes after `max-kill-count` attempts have been made to terminate the job shell. However, it is difficult for admins to...
The exit-timeout exceptions have been somewhat confusing to users because when they pop up in a batch job error log, it isn't immediately clear _which_ job hit the exit timeout....
Problem: The perilog plugin currently does not raise an exception when the epilog fails, as documented in this comment: https://github.com/flux-framework/flux-core/blob/d4cdf62a1ddc1ea636afe4918e6c34e118fabf23/src/modules/job-manager/plugins/perilog.c#L422-L429 This makes sense when the job-manager epilogs were mostly meant...