James Corbett comments

Results 85 comments of


                                            James Corbett

`t8001` fails in CI, bracket mismatch?

I'm not sure what's going on here but the relevant code was changed in https://github.com/flux-framework/flux-sched/pull/1149

`t8001` fails in CI, bracket mismatch?

I wonder if somehow an older version of the Python `FluxionResourceGraphV1` class is being picked up? Like maybe there's another version of its module in `sys.path` for some reason?

Job submission slows down on Hetchy

By contrast, on Corona the timings are very consistent ``` [corbett8@corona211:~]$ for (( i=0; i

Job submission slows down on Hetchy

> Doesn't hetchy have some special resources added to its graph with JGF? You may be able to reproduce this issue in a test instance by loading similar fake resources....

Job submission slows down on Hetchy

Nope, closing.

Run administrative epilog even if job is canceled before starting

This became an issue on rzadams today. A job was canceled while the administrative prolog was running, but after rabbit file systems had mounted and the `nnf-clientmount` daemon had stopped....

traverser: allow non-exclusive slots

With this PR, [using this JGF graph](https://github.com/flux-framework/flux-sched/files/15211777/rzvernal_R_norabbit.json) and the `first` policy and this jobspec: ```yaml version: 9999 resources: - type: slot count: 3 label: default exclusive: false with: - type:...

traverser: allow non-exclusive slots

By comparison, keeping everything else constant but working off `master` instead of this PR: ``` resource-query> m allocate ../jobspec.yaml ---------ssd31[894:x] ------------core63[1:x] ---------rzvernal25[1:x] ------rack0[1:s] ---------ssd31[894:x] ------------core63[1:x] ---------rzvernal41[1:x] ------rack1[1:s] ---------ssd31[894:x] ------------core95[1:x] ---------rzvernal53[1:x]...

traverser: allow non-exclusive slots

Closing as stale, can always resurrect later.

idea: allow compute nodes to be released by a job before the `clean` event

I think we should circle back to this issue after the resolution of https://github.com/flux-framework/flux-coral2/issues/321. Because it's possible the resolution of that issue will be enough. But FWIW I think the...