James Corbett

Results 32 issues of James Corbett
trafficstars

Problem: the `set_status` RPC defined in `resource_match.cpp` takes only a single vertex path. The flux-coral2 project sometimes needs to send bulk updates. Since there are only two states, `UP` and...

Using the match policies `low` and `lownode` on this JGF representation of rzvernal: [rzvernal_R_norabbit.json](https://github.com/flux-framework/flux-sched/files/15180006/rzvernal_R_norabbit.json) and the following jobspec: ```yaml version: 9999 resources: - type: slot count: 4 label: default with:...

On rzadams I marked a rabbit vertex as down, and submitted a job that required a compute node on the same rack as that rabbit. The job, as expected, was...

flux-coral2 software adds a number of entries to jobs' KVS (see https://flux-framework.readthedocs.io/en/latest/tutorials/lab/rabbit.html#additional-attributes-of-rabbit-jobs). @behlendorf noted: > I can never remember the rabbit_workflow keyword. Could we get that added to the flux...

On rzadams, which was just today configured to use the `rv1` match format: ``` $ flux alloc -N2 flux-job: fqxGU4MP3XV started 00:00:17 Oct 14 19:16:24.615448 PDT sched-fluxion-resource.err[0]: grow_resource_db_jgf: db.load: unpack_edge:...

Snipped results of `flux dmesg` on hetchy: ``` 2024-08-27T01:46:53.149076Z sched-fluxion-resource.err[0]: run_remove: dfu_traverser_t::remove (id=152883667495027712): mod_plan: traverser tried to remove schedule and span after vtx_cancel during partial cancel: 2024-08-27T01:46:53.149167Z sched-fluxion-resource.err[0]: ssd0. 2024-08-27T01:46:53.149175Z...

The following message is being repeated somewhat regularly on a LC cluster with different job IDs. ``` [ +14.021503] sched-fluxion-resource[0]: run_remove: dfu_traverser_t::remove (id=345577048527356928): add_or_update: couldn't find vertex in graph for...

JGF is verbose, and Rabbit-y JGF on elcap systems can become very large. We discussed offline several ways to shrink JGF, both while maintaining the same format and compatibility with...

With this [JGF](https://github.com/user-attachments/files/16435134/tioga-JGF.json) and the jobspec below, I get the following error even though the match succeeds: ```yaml version: 9999 resources: - type: slot count: 2 label: default with: -...

Fixes the bug I found on our webex today.