WIP: enable RFC 14 resource ranges
This is my initial attempt to investigate what is required to enable resource ranges in jobspec à la RFC 14. Very rough at this point (in particular there is lots of work to do on the testsuite) but I am interested in discussion about some potential design considerations that have come up. The code to this point allows jobspec with ranges that is otherwise V1 compliant to be allocated and run successfully, but at least the job-list service still issues a non-fatal error, and I've just ignored sched-simple at this point.
(update: the discussion in this paragraph has now been addressed by #6682) Firstly, because flux-sched already largely supports ranges, once a couple of explicit version checks are tweaked in libjob and shell, we can already get back a resource set allocation from fluxion. From a design perspective, my personal opinion about these version checks is that it should be exclusively the remit of the validator to actually fail jobspec purely based on the version number. If a jobspec passes validation, then it seems to me that other components should at most warn about unsupported version numbers, but otherwise try their best to handle the rest of the jobspec (in future, I suppose a version check could also determine different code pathways for different versions). This would allow someone in future to define an extension of a supported version and not have other components arbitrarily fail even when all the information they require is present and conformant.
Secondly, the lack of information about slots in RFC 20 resource sets is problematic. In short, in my opinion the shell really shouldn't have to parse the jobspec resource section at all to determine what resources it will run on, it should just be able to consult the R allocation returned by the scheduler. flux-sched can already handle ranges for any/all of the resources in V1 jobspec, but if a range is used for the slot resource then the slot count decided by the scheduler is not recorded in R. If a fixed count was used for the cores then the slot count can be determined fairly easily in V1, but if a range was also used for the cores then the shell has to re-determine (guess, really) what the chosen slot and core counts should be (e.g. if 6 cores are allocated per node in R, then you could potentially have slot:core counts of 1:6, 2:3, 3:2, or 6:1).
In my mind the ideal long-term solution is to have slot information included in the allocated resource set; I know it's a "virtual" resource, but it's still part of the scheduler's decision making, and especially looking ahead to more complicated possibilities of canonical jobspec it seems like it will definitely be important to preserve the slot information in the resource graph allocated by the scheduler.
For now, I've added infrastructure to the shell initialization that ~~guesses~~ resolves a slot:core count combination if needed, but it perhaps raises the question of whether it even makes sense to allow ranges for both slot and core resources simultaneously? Philosophically, I generally prefer having fewer restrictions, lest some future user come up with a strange use case one never predicted, but requiring a fixed count for at least one of the slot or core resources does greatly simplify things at this point, and it doesn't seem too restrictive to forbid leaving it entirely to the scheduler to decide what combination of number of slots and cores are allocated!
Thanks for your well considered comments and for taking the time to study the Flux design. Some of the details have spilled out of our caches (mine at least) so there may be a bit of delay paging it back in - apologies in advance.
If a jobspec passes validation, then it seems to me that other components should at most warn about unsupported version numbers, but otherwise try their best to handle the rest of the jobspec (in future, I suppose a version check could also determine different code pathways for different versions). This would allow someone in future to define an extension of a supported version and not have other components arbitrarily fail even when all the information they require is present and conformant.
That seems like a sensible goal to me as long as we can keep the production systems on well trodden/tested pathways with the validator.
the shell really shouldn't have to parse the jobspec resource section at all to determine what resources it will run on, it should just be able to consult the R allocation returned by the scheduler.
I think perhaps that was the original plan when jobspec was designed, but IRL we found we needed to parse in the shell to generate the taskmap. Also in the flux-coral2 shell plugin for Cray PMI.
@garlick Thank you for the reply! I know this stuff probably isn't high priority for you guys what with El Capitan in focus, but I really appreciate the willingness to engage on other ideas too :)
I think perhaps that was the original plan when jobspec was designed, but IRL we found we needed to parse in the shell to generate the taskmap. Also in the flux-coral2 shell plugin for Cray PMI.
I actually think these are both good examples of how currently the only reason these functions are going to the jobspec for resource information at all is because the slot information wasn't available in R. This means the task_count, nslots, or cores_per_slot values needed to initially be retrieved/computed from the slot request in the resource section of the jobspec. For V1, having even just the slot_count/nslots (whatever name one chooses) also recorded in R would mean that all the necessary information could be computed purely from R and held in rcalc, and the shell (and associated plugins) could ignore the resource section of the jobspec entirely, at least for all the cases I've seen or can think of. Other jobspec sections such as tasks will of course still be parsed in.
Rebased after #6682 so the remaining changes are focused on the actual work towards enabling resource ranges.
@garlick (and others, of course!) pursuant to the earlier discussion of including slot count information in R, it would be great if you have a minute to take a look at my proof-of-concept branch https://github.com/sam-maloney/flux-core/tree/R-nslots
Basically, it modifies sched-simple to emit an integer nslots into the execution dict of R, which if present is picked up by the shell init to avoid basically all parsing of the jobspec resources. This would of course also require changes to flux-sched to maintain consistency, but I would love to hear opinions on such a change. I think overall it nicely demonstrates how such a simple addition to R makes a lot of other data flows much simpler and more consistently separates the concepts of jobspec.resources as an abstract request vs R as a concretized allocation!
Agreed! Unless I'm missing something obvious, including the slot count in R is an excellent design improvement, and the changes to support it in flux-core appear minimal enough that we could probably tack it onto Rv1.
Would you be willing to submit a PR to RFC 20? We'll want to get feedback from @trws and @milroy on how easy this is to do in the fluxion scheduler. If they see any issues they can comment here of course!
@garlick @grondo
I think this PR is ready for review now (other than the timeout in CI...), as flux-framework/rfc#466 seems to have garnered agreement.
For now nslots is optional and the code for determining it from the Jobspec resources is still there as fallback. Once it is also implemented in flux-sched then the fallback could eventually be deprecated and removed.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 84.07%. Comparing base (3fdc8e3) to head (e4670f0).
:warning: Report is 570 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #6632 +/- ##
==========================================
+ Coverage 84.04% 84.07% +0.03%
==========================================
Files 544 544
Lines 91835 91856 +21
==========================================
+ Hits 77180 77227 +47
+ Misses 14655 14629 -26
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/common/librlist/rlist.c | 86.35% <100.00%> (+0.07%) |
:arrow_up: |
| src/shell/jobspec.c | 96.22% <100.00%> (+1.48%) |
:arrow_up: |
| src/shell/rcalc.c | 84.64% <100.00%> (+0.12%) |
:arrow_up: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.