flux-core
flux-core copied to clipboard
view job constraints in `flux jobs`
I have a low priority feature request. I've had a few occasions lately where I've found myself digging around in the jobspec of individual jobs to see if the user requested specific nodes. I was thinking that it would be nice to have a 'constraints' field available with flux jobs -o ... that would display an appropriately joined representation of jobspec.attributes.system.contstraints. It would probably also be useful for me if that field were included in the fields displayed by the built-in -o deps format (which I find to be generally very useful) and possibly also in the contextual_info field.
my first thought was if we should have a -o constraints or something format similar to deps. The constraints could get long.
Support will have to be added to job-list to capture job constraints if we want to have general access to this data, since only the job and instance owner can fetch the whole jobspec.
Once we have that support, I don't know if the constraints object will be all that readable, esp with the added constraint for the queue property. I wonder if something more useful here might be an "eligible nodes" field, which would be easy to derive from any constraints object, whether it be constraint by property, hostlist or rank. Unfortunately this would show the full queue nodelist for most jobs. Perhaps there's a way to compare the eligible nodes to all nodes in the defined queue, and suppress output if they are equivalent.
The other approach would be to write some Python code that takes a constraint object and tries to represent it in a more human readable form. I guess that would be possible, but as @chu11 put it, it could get quite long.
BTW, in case this is useful in the interim, here's a script that checks all pending jobs and compares the set of eligible nodes to available nodes, issuing a warning if some nodes are down:
import flux
from flux.job import JobList, JobKVSLookup, JobID, JobspecV1
from flux.resource import resource_list
handle = flux.Flux()
rlist = resource_list(handle)
jobs = { x.id: x for x in JobList(handle, user="all", filters=["pending"]).jobs(
lookup = JobKVSLookup(handle, ids=jobs.keys())
lookup.fetch_data()
rset = rlist.get()
for result in lookup.data():
job = jobs[result["id"]]
jobspec = JobspecV1(**result["jobspec"])
constraint = jobspec.getattr("system.constraints")
eligible = rset.all.copy_constraint(constraint)
if len(eligible.nodelist) == 0:
print(f"{job.id} needs {job.nnodes}N but no nodes match its constraint")
else:
down = eligible & rset.down
up = eligible - down
if len(up.nodelist) < job.nnodes:
print(f"{job.id} needs {job.nnodes}N from {eligible.nodelist} but {l
# vi: ts=4 sw=4 expandtab
Sorry, I dropped this for a bit. I hear you on the full constraints object being potentially very unwieldy. That said, even something to indicate that there are extra constraints would be helpful. E.g. something that just said constraints:nodelist or constraints:property in the info field to help point users or support staff in the right direction with the all too common "why isn't my job running?" question.
We'll have to figure out if there are any extra constraints since the queue constraints are applied to any existing constraints with "and" in the job frobnicator. Possibly it may have been better to keep the system-applied constraints separate from the user supplied constraints in jobspec, to make this easier. For now, job-list may have to compare the signed jobspec vs the job-manager's possibly modified jobspec.
If eventually Fluxion could supply the most salient "reason" a job is pending, that could satisfy some of this use case too.