SLURM-examples icon indicating copy to clipboard operation
SLURM-examples copied to clipboard

Preferred syntax for one-command-per-line job arrays

Open pjvandehaar opened this issue 8 years ago • 16 comments

  • Option 1: (used by https://github.com/statgen/SLURM-examples/blob/master/job-array-one-command-per-line)

    #!/bin/bash
    #SBATCH --array=1-2
    declare -a commands
    commands[1]="Rscript myscript.R input_file_A.txt"
    commands[2]="Rscript myscript.R input_file_B.txt"
    bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}"
    
  • Option 2:

    #!/bin/bash
    #SBATCH --array=1-2
    commands=(
    "Rscript myscript.R input_file_A.txt"
    "Rscript myscript.R input_file_B.txt"
    )
    bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}"
    
  • Option 3:

    #!/bin/bash
    #SBATCH --array=1-2
    read -d '' commands <<'EOF'
    Rscript myscript.R input_file_A.txt
    Rscript myscript.R input_file_B.txt
    EOF
    echo "$commands" | sed -n ${SLURM_ARRAY_TASK_ID}p | bash
    
  • Option 4: somewhere there's a script (maybe written by Terry?) that does this.

pjvandehaar avatar Jul 12 '17 16:07 pjvandehaar

I think option 1 is better, because:

  • It is more easy to see which commands failed. E.g. imagine job5 failed. With grep -F "[5]" command you can quickly jump into the executed command line. This is not so straightforward with options 2 and 3.
  • slrum log files are named using job number within job array. Given this. if you see slurm-123845_5.out, then i know that this is output from grep -F "[5]" command.

dtaliun avatar Jul 12 '17 16:07 dtaliun

I would change bash -c "${commands[${SLURM_ARRAY_TASK_ID}]}" to eval ${commands[${SLURM_ARRAY_TASK_ID}]}

dtaliun avatar Jul 12 '17 16:07 dtaliun

slurm scripts are limited to 4Mb. If thousand of jobs a listed in such manner and the command lines are long, then i recommend to save symbols. E.g. instead of commands just use c or j or any single letter. Make R script (or python) executable, to avoid repeating Rscript. Ant etc.

dtaliun avatar Jul 12 '17 16:07 dtaliun

Is there a way to generalize the syntax so it is possible to eventually replace SLURM with another scheduler?

Goncalo

On Wed, Jul 12, 2017 at 12:57 PM, dtaliun [email protected] wrote:

slurm scripts are limited to 4Mb. If thousand of jobs a listed in such manner and the command lines are long, then i recommend to save symbols. E.g. instead of commands just use c or j or any single letter. Make R script (or python) executable, to avoid repeating Rscript. Ant etc.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314831642, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUP9RN3YY7MBrbnIi2ml0EzNRW6Fqks5sNPr2gaJpZM4OV5mM .

abecasis avatar Jul 12 '17 17:07 abecasis

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1)

This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

* i've seen odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

schelcj avatar Jul 12 '17 17:07 schelcj

Yes, this syntax is also ok for PBS on flux (only names of env. variables are different).

dtaliun avatar Jul 12 '17 17:07 dtaliun

Also, using head/tail or sed makes it not bash specific.

schelcj avatar Jul 12 '17 17:07 schelcj

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

  • i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

abecasis avatar Jul 12 '17 19:07 abecasis

So,

  • option 5: run sarray cmds.txt, where sarray is something like:

    #!/bin/bash
    cmd_file="$1"
    num_jobs="$(cat $cmd_file | grep -v "^#" | grep . | wc -l)" # ignore comments and blank lines
    sbatch_args="$(echo $(cat "$this_file" | grep -oP '^#SBATCH \K.*'))" # collect args from #SBATCH lines
    sbatch_cmd='eval "$(cat '"$cmd_file"' | grep -v "^#" | grep . | head -n $SLURM_ARRAY_TASK_ID | tail -n1)"'
    sbatch $(sbatch_args) --array=1-$num_jobs --wrap="$sbatch_cmd"
    

    (probably placed in /net/mario/cluster/bin/) and cmds.txt is like:

    python3 -c 'print(chr(9) == "\t")'
    Rscript a.R
    Rscript b.R
    #SBATCH --mem=1024
    

    features that would be needed:

    • extract & print job id from sbatch output. if sbatch output looks interesting, print it verbatim.
    • allow users to run sarray --get 7 cmds.txt to extract the command for task 7.
      • same as cat cmds.txt | grep -v '^#' | grep . | sed -n 7p
    • allow users to submit their own --array=4,7,8 to run a subset that failed previously.
      • or just run sarray --get 4,7,8 cmds.txt > cmds2.txt && sarray cmds2.txt
    • allow users to specify the number of concurrent jobs, like --array=1-100%5 does.
    • maybe put stdout/stderr in ~/tmp/sarray-output/<job_id>/<task_id> and print that path.

pjvandehaar avatar Jul 12 '17 19:07 pjvandehaar

In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.

On Jul 12, 2017, 3:19 PM -0400, abecasis [email protected], wrote:

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

  • i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

schelcj avatar Jul 12 '17 19:07 schelcj

Right - and those abstractions will be easier to write if the SLURM commands are within a small number of functions rather than peppered throughout code.

Goncalo

PS. Btw, are you around? We will need to do a lot of work in terms of a security policy for the TOPMed contract.

On Wed, Jul 12, 2017 at 3:28 PM, Chris Scheller [email protected] wrote:

In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.

On Jul 12, 2017, 3:19 PM -0400, abecasis [email protected], wrote:

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

  • i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314872335, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUGR0DkEGYkGUP_Wwqpn9_KUxLXsKks5sNR57gaJpZM4OV5mM .

abecasis avatar Jul 12 '17 20:07 abecasis

The abstraction should be above the batch script not within it. At the level of what generates the batch script. Their could be platforms that don't even need a batch script.

I'll be in tomorrow.

On Jul 12, 2017, 4:30 PM -0400, abecasis [email protected], wrote:

Right - and those abstractions will be easier to write if the SLURM commands are within a small number of functions rather than peppered throughout code.

Goncalo

PS. Btw, are you around? We will need to do a lot of work in terms of a security policy for the TOPMed contract.

On Wed, Jul 12, 2017 at 3:28 PM, Chris Scheller [email protected] wrote:

In that case it is just a matter of developing appropriate abstractions for target platforms to support multiple platforms.

On Jul 12, 2017, 3:19 PM -0400, abecasis [email protected], wrote:

There's no drive to replace slurm, but we want anyone to be able to setup a PheWeb instance and most people who might do that won't be running slurm.

It would be nice to be able to adapt the code minimally to add support for other schedulers as needed.

We are not going to have widespread deployment of pheweb if each instance requires slurm.

Goncalo

Sent from my iPhone

On Jul 12, 2017, at 1:26 PM, Chris Scheller [email protected] wrote:

I've been recommending putting the commands into a separate file, one per line, then using head/tail to pull lines out like this:

srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) This can also be done with sed like:

srun $(sed -n ${SLURM_ARRAY_TASK_ID}p cmds.txt)

  • i've been odd results with the sed command on some systems

This is pretty generalized as is but most national sites are moving to Slurm, as is Flux/ARC-TS. Any other scheduling system will have the same requirements for users to adapt to. What is the drive to replace slurm? What would you like to gain from another scheduler?

ps: i have a branch in this repo with array job examples but i have not completed the docs

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/SLURM-examples/issues/5#issuecomment-314872335, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCoUGR0DkEGYkGUP_Wwqpn9_KUxLXsKks5sNR57gaJpZM4OV5mM .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

schelcj avatar Jul 12 '17 21:07 schelcj

Is sarray still around, I should remove those docs. I recommend keeping it simple with head/tail or sed and explain how to use different --array= values.

schelcj avatar Jul 12 '17 23:07 schelcj

I cannot find the sarray command, though there are lots of docs still around. runslurm.pl may have replaced it.

pjvandehaar avatar Jul 15 '17 14:07 pjvandehaar

No, sarray was something I wrote for the biostat cluster before slurm supported array jobs. It really shouldn't be used anymore.

runslurm.pl is something @tpg wrote for the CSG cluster back when they were transitioning from Mosix to slurm, I think but maybe not.

On Jul 15, 2017, 10:43 AM -0400, Peter VandeHaar [email protected], wrote:

I cannot find the sarray command, though there are lots of docs still around. runslurm.pl may have replaced it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

schelcj avatar Jul 15 '17 18:07 schelcj

Tried @schelcj 's solution with srun $(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1) and found that sbatch had troubles parsing the command lines. After changing it to bash -c "$(head -n $SLURM_ARRAY_TASK_ID cmds.txt | tail -n 1)" it worked just fine.

ilarsf avatar Dec 05 '17 13:12 ilarsf