Mark Grondona
Mark Grondona
During a recent system update, flux shutdown took over 40m with little feedback to indicate progress. Admins request better insight into shutdown progress. This will give them a better indication...
On a large, busy system, the flux logs are filled with `tagpool expanded` messages, obscuring other, more important messages and errors.
During a recent experiment to enable JGF on a large system, the system was slow and a job was hanging at startup. A decision was made to abort the experiment...
Problem: `flux resource drain` only reports current state of drained resources, but does not include historical drain events nor undrain reasons, though these are available in the resource eventlog. Some...
Problem: It would sometimes be convenient if error messages were labeled with a source and timestamp even for file output. This would aid in debugging jobs. It would probably be...
Working with @cmoussa1 on flux-framework/flux-accounting#774 we were trying to understand how the issue was even reproducible. To summarize, the accounting scripts were hitting an unexpected case where a job has...
This came up in a user query, and the solution is not exactly intuitive so I'm placing it in an issue so it is searchable and also to discuss if...
Sometimes a user wants to report a jobid running within a subinstance (e.g. a `flux run` within a batch job) for debugging, but a jobid alone is not enough for...
During a recent Flux upgrade, new packages were installed while Flux was running. These packages removed rc3 as part of the modprobe transition, which later prevented an orderly shutdown when...
Current Lua bindings were written for an older Flux API and have to either be updated or removed. It isn't clear if there is really a use case for the...