James Corbett
James Corbett
Problem: the `set_status` RPC defined in `resource_match.cpp` takes only a single vertex path. The flux-coral2 project sometimes needs to send bulk updates. Since there are only two states, `UP` and...
Using the match policies `low` and `lownode` on this JGF representation of rzvernal: [rzvernal_R_norabbit.json](https://github.com/flux-framework/flux-sched/files/15180006/rzvernal_R_norabbit.json) and the following jobspec: ```yaml version: 9999 resources: - type: slot count: 4 label: default with:...
On rzadams I marked a rabbit vertex as down, and submitted a job that required a compute node on the same rack as that rabbit. The job, as expected, was...
flux-coral2 software adds a number of entries to jobs' KVS (see https://flux-framework.readthedocs.io/en/latest/tutorials/lab/rabbit.html#additional-attributes-of-rabbit-jobs). @behlendorf noted: > I can never remember the rabbit_workflow keyword. Could we get that added to the flux...
On rzadams, which was just today configured to use the `rv1` match format: ``` $ flux alloc -N2 flux-job: fqxGU4MP3XV started 00:00:17 Oct 14 19:16:24.615448 PDT sched-fluxion-resource.err[0]: grow_resource_db_jgf: db.load: unpack_edge:...
Snipped results of `flux dmesg` on hetchy: ``` 2024-08-27T01:46:53.149076Z sched-fluxion-resource.err[0]: run_remove: dfu_traverser_t::remove (id=152883667495027712): mod_plan: traverser tried to remove schedule and span after vtx_cancel during partial cancel: 2024-08-27T01:46:53.149167Z sched-fluxion-resource.err[0]: ssd0. 2024-08-27T01:46:53.149175Z...
The following message is being repeated somewhat regularly on a LC cluster with different job IDs. ``` [ +14.021503] sched-fluxion-resource[0]: run_remove: dfu_traverser_t::remove (id=345577048527356928): add_or_update: couldn't find vertex in graph for...
JGF is verbose, and Rabbit-y JGF on elcap systems can become very large. We discussed offline several ways to shrink JGF, both while maintaining the same format and compatibility with...
With this [JGF](https://github.com/user-attachments/files/16435134/tioga-JGF.json) and the jobspec below, I get the following error even though the match succeeds: ```yaml version: 9999 resources: - type: slot count: 2 label: default with: -...
Fixes the bug I found on our webex today.