James Corbett

Results 85 comments of James Corbett

I'm not sure what's going on here but the relevant code was changed in https://github.com/flux-framework/flux-sched/pull/1149

I wonder if somehow an older version of the Python `FluxionResourceGraphV1` class is being picked up? Like maybe there's another version of its module in `sys.path` for some reason?

By contrast, on Corona the timings are very consistent ``` [corbett8@corona211:~]$ for (( i=0; i

> Doesn't hetchy have some special resources added to its graph with JGF? You may be able to reproduce this issue in a test instance by loading similar fake resources....

This became an issue on rzadams today. A job was canceled while the administrative prolog was running, but after rabbit file systems had mounted and the `nnf-clientmount` daemon had stopped....

With this PR, [using this JGF graph](https://github.com/flux-framework/flux-sched/files/15211777/rzvernal_R_norabbit.json) and the `first` policy and this jobspec: ```yaml version: 9999 resources: - type: slot count: 3 label: default exclusive: false with: - type:...

By comparison, keeping everything else constant but working off `master` instead of this PR: ``` resource-query> m allocate ../jobspec.yaml ---------ssd31[894:x] ------------core63[1:x] ---------rzvernal25[1:x] ------rack0[1:s] ---------ssd31[894:x] ------------core63[1:x] ---------rzvernal41[1:x] ------rack1[1:s] ---------ssd31[894:x] ------------core95[1:x] ---------rzvernal53[1:x]...

Closing as stale, can always resurrect later.

I think we should circle back to this issue after the resolution of https://github.com/flux-framework/flux-coral2/issues/321. Because it's possible the resolution of that issue will be enough. But FWIW I think the...