flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

jobspec: allow multiple 'slot' entries, particularly above 'node'

Open jameshcorbett opened this issue 1 year ago • 3 comments
trafficstars

The following jobspec

{
  "resources": [
    {
      "type": "slot",
      "count": 1,
      "label": "rabbit",
      "with": [
        {
          "type": "node",
          "count": 1,
          "exclusive": true,
          "with": [
            {
              "type": "slot",
              "count": 1,
              "with": [
                {
                  "type": "core",
                  "count": 1
                }
              ],
              "label": "task"
            }
          ]
        }
      ]
    }
  ],
  "tasks": [
    {
      "command": [
        "hostname"
      ],
      "slot": "task",
      "count": {
        "per_slot": 1
      }
    }
  ],
  "attributes": {
    "system": {
      "duration": 0,
      "environment": {},
      "shell": {}
    }
  },
  "version": 1
}

is rejected by the shell with error: jobspec: node resource encountered after slot resource. The shell is properly enforcing a V1 restriction. However, in the current way that rabbit resources are organized, it would be very useful to have a top-level 'slot' entry above 'node' and 'ssd'. Fluxion understands such jobspecs, and according to @grondo , sched-simple does as well.

Thoughts on whether we could enable this functionality / disable this check in the shell?

jameshcorbett avatar May 10 '24 17:05 jameshcorbett

There was some work by @SteVwonder a few years ago in the job shell jobspec parser to support non-V1, e.g. #3160, and #3175. However, the job shell now explicitly checks for version == 1 and rejects anything without that version, so I'm not sure how a non-V1 jobspec is meant to be processed by the shell.

I'm not sure if we need to draft a V2 (or Vn), or extend V1, etc. Looking for any opinions here... :-)

grondo avatar May 10 '24 17:05 grondo

At the moment this is mostly needed for flux-coral2 testing, because in actual usage jobspecs like mentioned above are constructed after submission and only the job-manager's copy (which it sends to the scheduler) is modified. However, as @grondo mentioned on a call, the copy the shell receives will still be the user's original, signed version. So I think this issue shouldn't be seen in production, under the current design of flux-coral2.

jameshcorbett avatar May 10 '24 21:05 jameshcorbett

Related: #3310

grondo avatar May 16 '24 03:05 grondo