James Corbett
James Corbett
> After typing that up, I realized one issue with a special exit code is that exiting early from the prolog due to a potentially nonfatal error is probably not...
I'll check the match policy, but I'll also see if I can reproduce locally.
@behlendorf has been seeing this issue repeatedly on Hetchy. He submits a rabbit job, i.e. one that has `ssd` entries in the jobspec. However, all `ssd` vertices are marked down,...
I reloaded the resource and fluxion modules and scheduling went back to working as expected at first, but then as I ran jobs they eventually became stuck in SCHED. ```...
The issue seems to have been introduced between 0.36.1 and 0.37.0.
I can reproduce in the flux-coral2 environment locally or on LC clusters, but there are a bunch of plugins loaded. The simplest thing I have is the following I think....
@milroy I have a branch in my fork that repros the issue https://github.com/jameshcorbett/flux-sched/tree/issue-1284 Interestingly while fooling around with it I noticed that the issue only comes up if the jobspec...
Ok @milroy I think I have an improved reproducer at https://github.com/jameshcorbett/flux-sched/tree/issue-1284
With this patch that @trws and I talked about ``` diff --git a/qmanager/policies/base/queue_policy_base.hpp b/qmanager/policies/base/queue_policy_base.hpp index 6fa2e44d..e9fd1166 100644 --- a/qmanager/policies/base/queue_policy_base.hpp +++ b/qmanager/policies/base/queue_policy_base.hpp @@ -666,7 +666,7 @@ class queue_policy_base_t : public resource_model::queue_adapter_base_t...
OK, interesting. I will have to try it out. I just realized that we're switching the EAS clusters to the `rv1` match format, from `rv1_nosched`. Is partial cancel going to...