benchexec SLURM: questions about job scheduling and BenchExec run representation

/cc @leventeBajczi

I'll preface this by saying that I have minimum familiarity with SLURM, so I might misunderstand something here. For a long time I wondered about the possibility of using BenchExec on our university HPC cluster which is exposed via SLURM and it's nice to see some support for it now.

I see that the SLURM executor uses some kind of work queue in its Python implementation: https://github.com/sosy-lab/benchexec/blob/64d73c47e05a1487727c4777e23863ce4ed4851a/contrib/slurm/slurmexecutor.py#L119-L124 Also there's a -N flag for parallelism here: https://github.com/sosy-lab/benchexec/tree/64d73c47e05a1487727c4777e23863ce4ed4851a/contrib/slurm#usage

Does it just appear to me that way or is the SLURM extension using SLURM in some non-standard way? Namely, SLURM itself is a job queue, so why is there another worker pool (with a queue and size) in this Python support layer?

I would have thought that all BenchExec runs would simply be submitted into SLURM as batch jobs and SLURM takes care of their scheduling and amount of parallelism that's available. But something else seems to be happening here: BenchExec SLURM extension seems to use a work queue just for submitting jobs to SLURM and waits for them to finish before submitting new ones, no? If so, what's the reason for doing that instead of just submitting everything at once.

I also gather that SLURM has not just jobs, but job steps, tasks and job arrays (although I don't understand their differences). Should BenchExec runs maybe not all be individual SLURM jobs, but some smaller processing units?

Mar 29 '24 10:03 sim642

Does it just appear to me that way or is the SLURM extension using SLURM in some non-standard way? Namely, SLURM itself is a job queue, so why is there another worker pool (with a queue and size) in this Python support layer?

Yes, it'd probably be possible to use SLURM for queueing all the tasks at once, and simplify this module. This way, however, we do not flood the queue with potentially tens of thousands of tasks, but always at most with -N's value. For me, this was a big concern, because there have been problems previously regarding this issue at the HPC cluster I have access to.

I also gather that SLURM has not just jobs, but job steps, tasks and job arrays (although I don't understand their differences).

This makes two of us :) I did not find any advantage of using steps instead of jobs. Also, I went with the path of least resistence when it comes to dealing with SLURM's un(der)documented behaviors. Therefore, I wanted something that works before we do something that is optimal. If you have any ideas on how to improve this part, I'm open to changing this!

Also, thanks for your interest in Benchexec's SLURM extensions, and feel free to tag me in any issues you may find (I'm sure there will be plenty). Right now, to the best of my knowledge, I'm the only one that uses this extension, so the more of us there will be, the quicker we can find and fix its bugs.

Mar 29 '24 14:03 leventeBajczi

Is there something left open or can we close this?

Jun 07 '24 05:06 PhilippWendler

benchexec benchexec copied to clipboard

SLURM: questions about job scheduling and BenchExec run representation

benchexec
benchexec copied to clipboard