looper
looper copied to clipboard
how do to parallel-process files with looper locally -- lump them together, this will group several commands in one like:
how do to parallel-process files with looper locally -> originally a divvy idea. issue: 100 files -> divvy submits to cluster no problem, if local they will run serial. Could run in background process using ampersand. in command shell script with &. So can we lump 100 samples (in 10 background processes), new divvy template to accomplish that
- Original Notes
Currently, if using lump for samples locally (--lump-n), we get a couple submission scripts:
#!/bin/bash
echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`
{
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample1 --req-attr val
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample2 --req-attr val
} | tee /tmp/tmplr4qcfsa/advanced/results/submission/PIPELINE1_lump1.log
#!/bin/bash
echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`
{
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample3 --req-attr val
} | tee /tmp/tmplr4qcfsa/advanced/results/submission/PIPELINE1_lump3.log
Perhaps we can change the template to use & for running in parallel?
#!/bin/bash
echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`
{
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample1 --req-attr val &
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample2 --req-attr val &
python3 /tmp/tmplr4qcfsa/advanced/pipeline/pipeline1.py --sample-name sample3 --req-attr val &
} | tee /tmp/tmplr4qcfsa/advanced/results/submission/PIPELINE1_lump1.log
I don't believe looper currently allows for parallel processing when using a local machine (i.e. non-slurm submission). I checked the divvy templates to be sure. I'm going to close this issue for now and we can open a new one to add this enhancement if it is desired.
There were at one point divvy templates that would allow this, but there was no way to control how many jobs were submitted.
basically if you add an ampersand at the end of the command, then it runs it in the background.