flux-core
flux-core copied to clipboard
job-info: return job state as number encourages hardcoding values in shell scripts
Problem: job-info.list
returns job states as numerical values. This encourages people parsing the output in a bash script with jq
to hard code the enum values.
It would be better IMHO to return the state names (strings) from RFC 21.
That might work out better in the python bindings too...
Seems like a good idea, instead of the user converting numeral to string.
Another approach would be to add flux job statestr N
utility that wraps flux_job_statetostr()
and allow scripts to use that. I'm ambivalent either way. (though if states ever start getting other flags masked in, sending an integer would start to be much more compact, instead of just a little more compact)
That's maybe a better idea, thanks!
Granted flux job
is a plumbing tool, but I just have trouble seeing the use case where someone needs the number of a job state returned from job-info.list
, so I like the idea of returning the string.
If we really need a numeral someday, we could just add a new attribute in the job-info.list
service, like state-number
or something.
I wasn't considering a requirement for a state number instead of string, but instead was considering the size of underlying json strings. Probably not an issue at this point since it would just add a few characters per job.
It occurred to me this morning that switching to a state string may be a little more inconvenient in C/C++ code. E.g. @dongahn's MPIR PR we have:
if (state > FLUX_JOB_RUN)
We'd likely at least want to add flux_job_stringtostate (s)
to the API if state was returned as a string, though the state as integer seems much more convenient here.
OK, I'm thinking now it may be less bad to keep the numbers. It will be irritating to have to convert in order to use it it in masks and switch statements in C.
if (state > FLUX_JOB_RUN)
This is probably a bit unsafe though, in case we ever add a state and tack it on the end numerically to avoid breaking ABI (and it doesn't "come after" RUN).
Would it be appropriate to formalize the numbers and maybe the single character abbreviations in RFC 21?
This is probably a bit unsafe though, in case we ever add a state and tack it on the end numerically to avoid breaking ABI (and it doesn't "come after" RUN).
Good point!
Would it be appropriate to formalize the numbers and maybe the single character abbreviations in RFC 21?
Good idea. I wonder if we'll also want to explicitly state that the order of the state numbers is meaningless, and provide some convenience functions to satisfy use cases like this. Actually maybe in the case pasted above (where flux job attach
wants to detect if a job has not proceeded past the RUN state), the code should use the virtual states (FLUX_JOB_PENDING | FLUX_JOB_RUNNING)
.
I'll comment in #2654 to that effect.
Note that I'm still open to switching job states away from state numbers. As long as we've got the use cases easily covered. (e.g. even a python script may want to test for a virtual state like "pending" . This could be done with state strings just as easily as state numbers, just a lot more string comparisons)
Actually maybe in the case pasted above (where flux job attach wants to detect if a job has not proceeded past the RUN state), the code should use the virtual states (FLUX_JOB_PENDING | FLUX_JOB_RUNNING).
Uh I guess that should have been FLUX_JOB_PENDING | FLUX_JOB_RUN
, because the CLEANUP state would also be invalid for the debugger attach case.
This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 14 days. Thank you for your contributions.
i'm wondering if this issue should be resurrected given discussion in #4273, #4313, and #4234
- we may want to dump large amounts of job data into the archive in 1 big json blob, so that's a lot of hard coded values going into the archive
- i have a WIP branch that adds a new "states_mask" (bitmask of states that have been seen by a job) that would probably be needed for archiving, so thats more hardcoded stuff going into the archive.
- additional job states may be coming: https://github.com/flux-framework/rfc/pull/306