shell: determine node & core resource counts from R during init
Problem: counts for resources are currently being parsed only from jobspec, but if/once ranges are used then values cannot be determined.
Modify shell init data flow to use the actual allocation from the scheduler in R, if present, to determine resource counts for nodes & cores in jobspec_parse.
As a first step, this checks whether a resource set struct was passed to jobspec_parse, and otherwise leaves the original codepath unchanged. The slot count must still be parsed from the jobspec regardless, because it is missing from R.
The concept for this is also essentially separated off from #6632, albeit with a fair bit of re-working to keep the changes as minimal as possible. Because flux-sched already has basic support for ranges, this PR already allows jobspec with ranges for nodes and/or cores to run successfully.
This is great! The changes look minimal and understandable so I'll try to get a review done soon. Thanks!
@grondo perhaps a gentle nudge for a review now that the May release is complete :)
Thanks for the nudge @sam-maloney!
This looks good to me. However, one observation is that it appears the "old" code path (recursive_parse_jobspec_resources () which gathers the counts via jobspec only) is now unused except for the unit tests in src/shell/test/jobspec.c. I wonder if we should just drop that code and update the unit tests to provide an R?
What do you think? It is also possible I misread the code and there is some use case still present outside of the unit test. Even in that case though, it might be nice to have a structure to test with provided simple R/rcalc values.
@grondo thanks for having a look!
After a quick search I actually think you're absolutely correct that the only use of the old code path would now be the test suite; I think my original intent was to keep the changes absolutely minimal, but if you think the approach overall seems sound then I am happy to drop the old code and put in some time to update the unit tests.
Let me know if you have any other thoughts, otherwise I'll ping you when I have an update ready!
@grondo alright, I think I have a working update ready; tests are passing on my machine so hopefully the CI agrees!
I've removed the unused code, and added a simple resource set to each test which matches with the resources requested by the jobspec being tested (or at least something reasonable for tests that are supposed to fail).
I've completely removed a number of the bad tests that were checking for certain disallowed resource configurations, since part of the argument driving this work is that the shell shouldn't really be parsing the jobspec resources but rather leave it to the validator and/or scheduler.
The only previously good test which now fails (and I've also removed) was the "node[2]->socket[3]->slot[5]->core[3]". Again, since the impetus for this is allowing ranges to be specified, I'm intentionally avoiding the assumption that any of the counts are integers (other than for the slots), so having a resource with a non-unity count between the node and slot (if node is given) means there's no easy way to know the multiplier. This (hopefully non-deal-breaking) restriction is documented in the comment above the recursive parsing function.
And fingers crossed that down the road, if/once slot information can be included in R, then the remaining resource parsing can also be dropped entirely!
Apologies @sam-maloney, I dropped the ball here :disappointed:
Once #6930 is merged we can rebase and merge this one too.
Apologies @sam-maloney, I dropped the ball here 😞
No worries @grondo; with our project officially starting it's had the paradoxical effect of diverting me from working on flux with lots of meetings and admin work as everything gets going, haha!
Codecov Report
:x: Patch coverage is 97.56098% with 1 line in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 83.91%. Comparing base (bd1ece4) to head (6ab3719).
:warning: Report is 700 commits behind head on master.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/shell/info.c | 66.66% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #6705 +/- ##
==========================================
+ Coverage 83.89% 83.91% +0.01%
==========================================
Files 540 540
Lines 90707 90676 -31
==========================================
- Hits 76099 76091 -8
+ Misses 14608 14585 -23
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/shell/jobspec.c | 94.73% <100.00%> (+12.99%) |
:arrow_up: |
| src/shell/info.c | 75.86% <66.66%> (ø) |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.