az-hop
az-hop copied to clipboard
feat: add warm queues
fixes #1172
Warm queues will always hold at least 1 idling node so that new jobs can start instantly. The cronjob that checks for this is run every 5 minutes, on workdays only.
This is the setup we've been using for SLURM - it works (and is a big productivity win in practice) but would need to be generalized for other schedulers.
Of course it is not the ideal solution, since it creates some noise in the job scheduler (the job counter will go up). Ideally, CycleCloud or the scheduler could be configured to take care of this (and perhaps there is a way to get approximately what we do here).
this is interesting enough to be mainstream. can you please also update the config.tpl.yml and the documentation ?