flux-sched icon indicating copy to clipboard operation
flux-sched copied to clipboard

Preemptible jobs

Open grondo opened this issue 1 year ago • 1 comments

AKA 'standby' qos / queue. Allow users to submit jobs that can be killed automatically by the system instance if another job needs the resources.

Currently creating this as an issue in flux-sched, since the scheduler may be the only component that knows when it would need to request termination of a preemptible/standy job to free up resources for a higher priority job.

There may be some components necessary in flux-core to make this work end-to-end though (e.g. how to create a standby queue or otherwise mark a job as preemptible. We can link these here as we create them.

grondo avatar May 24 '23 02:05 grondo

Quick discussion with @garlick led to the following possible implementation:

  • add a new preemptible job flag (i.e. similar to waitable and debug)
  • a scheduler which implements job preemption can then:
    • ignore any timelimit on jobs marked as preemptible when doing backfill
    • raise an exception on preemptible jobs when they are preventing a higher priority job from running

For testing perhaps bare minimum support could be added to sched-simple: If the job at the front of the queue can't currently be run, check if killing any preemptible jobs would free enough resources to run that job. This would be simple to implement, and would allow us to test "preemption" standalone in the flux-core testsuite.

grondo avatar May 24 '23 20:05 grondo