SLURM-examples icon indicating copy to clipboard operation
SLURM-examples copied to clipboard

Potentially incorrect information on SLURM-examples page

Open novosirj opened this issue 7 years ago • 0 comments

Hi there,

Happened to be looking for some information on this subject when I came across some information on the SLURM-examples page, found here: https://github.com/statgen/SLURM-examples, that says the following:

"scontrol show job -dd <job_id>. Shows all information about specific SLURM job. It is worth paying attention to the following information:

Requeue. Shows how many times your job was re-queued. Some jobs may have higher priority and may pre-empt (i.e. cancel) your running jobs and put them back to the queue. If your job takes too long time and Requeue is greater than 1 then, most probably, the reason why your job takes so long is because it was cancelled and re-queued several times."

I had briefly thought, wow, I learned a new thing, but I don't believe it's true. Per the scontrol manual, found here: https://slurm.schedmd.com/scontrol.html:

Requeue=<0|1> Stipulates whether a job should be requeued after a node failure: 0 for no, 1 for yes.

That's in the "update" section of the scontrol manual, but I don't have a single job that says anything other than Requeue=0 or Requeue=1. I did a little bit of looking at the source code, but can't really tell/maybe am looking in the wrong place.

novosirj avatar Jun 19 '18 15:06 novosirj