flux-core
flux-core copied to clipboard
jobspec: allow multiple 'slot' entries, particularly above 'node'
The following jobspec
{
"resources": [
{
"type": "slot",
"count": 1,
"label": "rabbit",
"with": [
{
"type": "node",
"count": 1,
"exclusive": true,
"with": [
{
"type": "slot",
"count": 1,
"with": [
{
"type": "core",
"count": 1
}
],
"label": "task"
}
]
}
]
}
],
"tasks": [
{
"command": [
"hostname"
],
"slot": "task",
"count": {
"per_slot": 1
}
}
],
"attributes": {
"system": {
"duration": 0,
"environment": {},
"shell": {}
}
},
"version": 1
}
is rejected by the shell with error: jobspec: node resource encountered after slot resource. The shell is properly enforcing a V1 restriction. However, in the current way that rabbit resources are organized, it would be very useful to have a top-level 'slot' entry above 'node' and 'ssd'. Fluxion understands such jobspecs, and according to @grondo , sched-simple does as well.
Thoughts on whether we could enable this functionality / disable this check in the shell?
There was some work by @SteVwonder a few years ago in the job shell jobspec parser to support non-V1, e.g.
#3160, and #3175. However, the job shell now explicitly checks for version == 1 and rejects anything without that version, so I'm not sure how a non-V1 jobspec is meant to be processed by the shell.
I'm not sure if we need to draft a V2 (or Vn), or extend V1, etc. Looking for any opinions here... :-)
At the moment this is mostly needed for flux-coral2 testing, because in actual usage jobspecs like mentioned above are constructed after submission and only the job-manager's copy (which it sends to the scheduler) is modified. However, as @grondo mentioned on a call, the copy the shell receives will still be the user's original, signed version. So I think this issue shouldn't be seen in production, under the current design of flux-coral2.
Related: #3310