Mark Grondona

Results 154 issues of Mark Grondona

The default timeout doesn't seem to be firing on actual systems for `flux perilog-run prolog`. There's testing in the testsuite with local scripts, but not with `--exec-per-rank`, so perhaps timeouts...

This PR is meant to be merged after #5818. It updates the default wait-event for `flux job attach` to `clean` from `finish`. This means that `flux run` and `flux alloc`...

Users have reported that there is not enough detail from `flux-job attach` when a job fails. That is, we currently report: ``` flux-job: task(s) exited with exit code 1 ```...

In flux-framework/flux-sched#1222 @trws observed > the job-manager processes all the cancels, but keeps sending all the no-longer-valid alloc requests anyway, for something like 10 minutes, before we start getting cancels...

An unknown MPI app on Frontier that was working with flux-core v0.55 started failing after an upgrade to v0.63 with the following: ``` MPICH ERROR [Rank 0] [job id unknown]...

``` grondo@corona211:~$ flux start -s1 --recovery=flux-f3HJRCdtW9kj-dump.tgz flux getattr content.restore flux-getattr: content.restore: No such file or directory Jun 07 13:46:34.477875 broker.err[0]: rc2.0: flux getattr content.restore Exited (rc=1) 0.0s flux-start: 0 (pid...

While working on something I noticed that there was no libflux version of the `ev_is_active()` call, which could come in handy to check if a watcher has been started. This...

This was briefly discussed in the dev meeting last week, but there's a somewhat constant load (40% of a CPU) on the rank 0 broker on a large system attributed...

Users have been seeing this error in their batch job logs frequently on elcap: ``` Jun 26 12:53:18.225473 PDT sched-fluxion-resource.err[0]: match_multi_request_cb: match failed due to match error (id=578897838080): Invalid argument...

A question sysadmins and developers get often is "why is job X not running?" It seems like Fluxion could provide insigths to make this question easier to answer, perhaps even...