flux-sched
flux-sched copied to clipboard
feasibility check: reject jobs that exceed static node/core policy limits defined per RFC 33
We need a mechanism for enforcing static job limits at a per-queue level. For example, we may want to set a walltime limit of 12 hours on jobs submitted to batch and a walltime limit of 30 minutes on jobs submitted to debug.
Where should this be enforced? In a coffee time discussion, two modules were identified as possible places: job-ingest plugins and qmanager. In theory, enforcement can exist in both places, with job-ingest filtering out the most obvious and easy to check limits, while qmanager provides a robust, full-featured set of checks.
How should these limits be specified? Presumably this would be a configuration option in a config file, but would it be a module-specific config, or a generic config section? Should users be able to override these settings at module load time?
Should users be able to override these settings at module load time?
At the system level, this will be sys-admins, and they might want to do this as module load time option will take precedence over the configuration file. For qmanager, I am trying to make it such that both config file and module load options match.
If we were to do the per-queue static limit check at the job-ingest plug-in level, we might want qmanager to dump those limited into a well known location and the plug-in makes use of it. If qmanager gets reloaded, it can update the info to this this location so that the plug-in can use the updated info.
If we arrange it this way, does flux-sched needs to provide that plugin? Can we define a protocol so that a "generic" plugin can do various 3rd party limit checks?
Now that we have configurable limits defined in RFC 33 and enforcement at ingest by flux-core, I'll rename this issue to indicate the gap that remains, which I think is for the scheduler feasibility check to use its knowledge of configured resources to enforce static node/core limits when the jobspec does not specify one or the other, e.g. this note in flux-config-policy(5)
Limit checks take place before the scheduler sees the request, so it is possible to bypass a node limit by requesting only cores, or the core limit by requesting only nodes (exclusively) since this part of the system does not have detailed resource information. Generally node and core limits should be configured in tandem to be effective on resource sets with uniform cores per node. Flux does not yet have a solution for node/core limits on heterogeneous resources.