Preferred syntax for one-command-per-line job arrays
-
Option 1: (used by https://github.com/statgen/SLURM-examples/blob/master/job-array-one-command-per-line)
#!/bin/bash #SBATCH --array=1-2 declare -a commands commands[1]="Rscript myscript.R input_file_A.txt" commands[2]="Rscript myscript.R input_file_B.txt" bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}" -
Option 2:
#!/bin/bash #SBATCH --array=1-2 commands=( "Rscript myscript.R input_file_A.txt" "Rscript myscript.R input_file_B.txt" ) bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}" -
Option 3:
#!/bin/bash #SBATCH --array=1-2 read -d '' commands <<'EOF' Rscript myscript.R input_file_A.txt Rscript myscript.R input_file_B.txt EOF echo "$commands" | sed -n ${SLURM_ARRAY_TASK_ID}p | bash -
Option 4: somewhere there's a script (maybe written by Terry?) that does this.
I think option 1 is better, because:
- It is more easy to see which commands failed. E.g. imagine job5 failed. With grep -F "[5]" command you can quickly jump into the executed command line. This is not so straightforward with options 2 and 3.
- slrum log files are named using job number within job array. Given this. if you see slurm-123845_5.out, then i know that this is output from grep -F "[5]" command.
I would change bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}" to eval ${commands[${SLURM_ARRAY_TASK_ID}]}
slurm scripts are limited to 4Mb. If thousand of jobs a listed in such manner and the command lines are long, then i recommend to save symbols. E.g. instead of commands just use c or j or any single letter. Make R script (or python) executable, to avoid repeating Rscript. Ant etc.
Is there a way to generalize the syntax so it is possible to eventually replace SLURM with another scheduler?
Goncalo
On Wed, Jul 12, 2017 at 12:57 PM, dtaliun [email protected] wrote:
slurm scripts are limited to 4Mb. If thousand of jobs a listed in such manner and the command lines are long, then i recommend to save symbols. E.g. instead of commands just use c or j or any single letter. Make R script (or python) executable, to avoid repeating Rscript. Ant etc.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314831642, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUP9RN3YY7MBrbnIi2ml0EzNRW6Fqks5sNPr2gaJpZM4OV5mM .
I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:
srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1)
This can also be done with sed like:
srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)
* i've seen odd results with the sed command on some systems
This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?
ps: i have a branch in this repo with array job examples but i have not completed the docs
Yes, this syntax is also ok for PBS on flux (only names of env. variables are different).
Also, using head/tail or sed makes it not bash specific.
There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.
It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.
We are not going to have widespread deployment of pheweb if each instance requires slurm.
Goncalo
Sent from my iPhone
On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:
I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:
srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:
srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)
- i've been odd results with the sed command on some systems
This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?
ps: i have a branch in this repo with array job examples but i have not completed the docs
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
So,
-
option 5: run
sarray cmds.txt, wheresarrayis something like:#!/bin/bash cmd_file="$1" num_jobs="$(cat $cmd_file | grep -v "^#" | grep . | wc -l)" # ignore comments and blank lines sbatch_args="$(echo $(cat "$this_file" | grep -oP '^#SBATCH \K.*'))" # collect args from #SBATCH lines sbatch_cmd='eval "$(cat '"$cmd_file"' | grep -v "^#" | grep . | head -n $SLURM_ARRAY_TASK_ID | tail -n1)"' sbatch $(sbatch_args) --array=1-$num_jobs --wrap="$sbatch_cmd"(probably placed in
/net/mario/cluster/bin/) andcmds.txtis like:python3 -c 'print(chr(9) == "\t")' Rscript a.R Rscript b.R #SBATCH --mem=1024features that would be needed:
- extract & print job id from
sbatchoutput. ifsbatchoutput looks interesting, print it verbatim. - allow users to run
sarray --get 7 cmds.txtto extract the command for task 7.- same as
cat cmds.txt | grep -v '^#' | grep . | sed -n 7p
- same as
- allow users to submit their own
--array=4,7,8to run a subset that failed previously.- or just run
sarray --get 4,7,8 cmds.txt > cmds2.txt && sarray cmds2.txt
- or just run
- allow users to specify the number of concurrent jobs, like
--array=1-100%5does. - maybe put stdout/stderr in
~/tmp/sarray-output/<job_id>/<task_id>and print that path.
- extract & print job id from
In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.
On Jul 12, 2017, 3:19 PM -0400, abecasis [email protected], wrote:
There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.
It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.
We are not going to have widespread deployment of pheweb if each instance requires slurm.
Goncalo
Sent from my iPhone
On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:
I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:
srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:
srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)
- i've been odd results with the sed command on some systems
This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?
ps: i have a branch in this repo with array job examples but i have not completed the docs
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Right - and those abstractions will be easier to write if the SLURM commands are within a small number of functions rather than peppered throughout code.
Goncalo
PS. Btw, are you around? We will need to do a lot of work in terms of a security policy for the TOPMed contract.
On Wed, Jul 12, 2017 at 3:28 PM, Chris Scheller [email protected] wrote:
In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.
On Jul 12, 2017, 3:19 PM -0400, abecasis [email protected], wrote:
There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.
It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.
We are not going to have widespread deployment of pheweb if each instance requires slurm.
Goncalo
Sent from my iPhone
On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:
I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:
srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:
srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)
- i've been odd results with the sed command on some systems
This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?
ps: i have a branch in this repo with array job examples but i have not completed the docs
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314872335, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUGR0DkEGYkGUP_Wwqpn9_KUxLXsKks5sNR57gaJpZM4OV5mM .
The abstraction should be above the batch script not within it. At the level of what generates the batch script. Their could be platforms that don't even need a batch script.
I'll be in tomorrow.
On Jul 12, 2017, 4:30 PM -0400, abecasis [email protected], wrote:
Right - and those abstractions will be easier to write if the SLURM commands are within a small number of functions rather than peppered throughout code.
Goncalo
PS. Btw, are you around? We will need to do a lot of work in terms of a security policy for the TOPMed contract.
On Wed, Jul 12, 2017 at 3:28 PM, Chris Scheller [email protected] wrote:
In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.
On Jul 12, 2017, 3:19 PM -0400, abecasis [email protected], wrote:
There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.
It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.
We are not going to have widespread deployment of pheweb if each instance requires slurm.
Goncalo
Sent from my iPhone
On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:
I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:
srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:
srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)
- i've been odd results with the sed command on some systems
This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?
ps: i have a branch in this repo with array job examples but i have not completed the docs
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314872335, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUGR0DkEGYkGUP_Wwqpn9_KUxLXsKks5sNR57gaJpZM4OV5mM .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Is sarray still around, I should remove those docs. I recommend keeping it simple with head/tail or sed and explain how to use different --array= values.
I cannot find the sarray command, though there are lots of docs still around. runslurm.pl may have replaced it.
No, sarray was something I wrote for the biostat cluster before slurm supported array jobs. It really shouldn't be used anymore.
runslurm.pl is something @tpg wrote for the CSG cluster back when they were transitioning from Mosix to slurm, I think but maybe not.
On Jul 15, 2017, 10:43 AM -0400, Peter VandeHaar [email protected], wrote:
I cannot find the sarray command, though there are lots of docs still around. runslurm.pl may have replaced it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Tried @schelcj 's solution with
srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1)
and found that sbatch had troubles parsing the command lines.
After changing it to
bash -c "$(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1)"
it worked just fine.