Mark Grondona
Mark Grondona
I was able to reproduce memory corruption (though not on the rank 0 broker) by running a 512 broker instance across 64 nodes with a flat tbon topology `-Stbon.topo=kary:512`, e.g....
This might be a false alarm. I noticed that all the memory corruption crashes were on the same node: ``` $ ls *.core corona265-flux-broker-425-219062.core corona265-flux-broker-430-245463.core corona265-flux-broker-426-245426.core corona265-flux-broker-431-225521.core corona265-flux-broker-428-232894.core corona265-flux-broker-431-245473.core ```...
This _may_ have been fixed by #5803, but we've not been able to get a non-truncated core file on that system yet, so we can't be sure. Shall we close...
I wonder if the broker could capture the fact that termination is due to a signal here, and skip all the "was not properly shutdown" and "skipping 0MQ shutdown" errors,...
This has nothing to do with #4569 though, since that issue deals with job events. Were you thinking a utility or service that would aggregate all known eventlogs into a...
> The problem is that you need to be able to define those attributes without writing a yaml file every time? There is already a facility for specifying system attributes...
> Would the submit time (called t_submit in qmanager) work as the deferred_from value? That is a great idea. I was going to suggest something similar in that `t_submit` could...
The `--begin-time` option uses a timestamp (absolute time) which is obtained by parsing the user's argument with our Python `parse_datetime()` function: ``` --begin-time=DATETIME Convenience option for setting a begin-time dependency...
I'm confused. As shown above, the interface does not require users to actually specify the timestamp. The begin time can be specified as an offsite or absolute time or any...
I verified this case (reloading `sched-fluxion-resource` with `match-format=rv1_nosched`) is missed in the Fluxion testsuite. Strangely, many other cases are tested, so I wonder if this was purposeful? Anyway, adding this...