James Corbett
James Corbett
The system instance's resource graph has `cluster -> rack -> node`. The JGF it writes out for child instances does not include rack vertices, however it still writes out the...
Strangely, `hetchy` does not have this problem, it writes out the `rack` vertex. Something is off and since this is the same cluster as #1305 I wonder if the JGF...
Some nodes hit the issue on the cluster, some don't. Here is the JGF for the overall system, and the JGF for one node that hit the error and another...
I didn't see any obvious errors in the system JGF but I may well have missed something.
After changing the match format to `rv1` and restarting Flux, errors are still occurring: ``` Oct 15 09:42:56 cluster1 flux[1877160]: sched-fluxion-resource.err[0]: run_remove: dfu_traverser_t::remove (id=364473021286581248): add_or_update: couldn't find vertex in graph...
How did you build you flux installation? Or are you using an install build by someone else?
Yeah it would be good to have a test case somehow. I put this PR in flux-sched v0.38.0 via a patch so I don't think there's as much hurry to...
Some of my flux-coral2 tests are suggesting to me that the rabbit resources aren't freed, even though the error message goes away. I submitted a bunch of identical rabbit jobs...
> What are the scheduler and queue policies set to in the coral2 tests? Whatever the defaults are I think, there's no configuration done. > Let's make a reproducer similar...
> @jameshcorbett this PR is almost ready for review. First, I'd like to integrate the tests you wrote that reproduced the behavior and helped me fix it. Can you make...