looper icon indicating copy to clipboard operation
looper copied to clipboard

Running a looper pipeline ad hoc

Open nleroy917 opened this issue 11 months ago • 4 comments
trafficstars

I wonder if there are either 1) solutions for this or 2) easy ways to add the ability to run a looper pipeline in an ad hoc manner. What I mean by that is this: occasionally, the overhead of a traditional workflow can be a bit daunting, but I really enjoy the ease of dispatching off jobs through slurm+looper.

I would love to replace traditional bash for loops with looper calls.

An example

I have a folder with hundreds of mixed-type files. Some of these might be bedGraph files. I want to convert these to .bw format. I can use bigtools bedGraphToBigWig. Traditionally, I might just use a for loop:

for file in *.bdg; do
  bigtools bedGraphToBigWig $file $file.bw
done;

But this takes awhile since it goes one-by-one, and there are hundreds. I'd love to fire them all off at once using looper and slurm:

ls *.bdg | looper run "bigtools bedGraphToBigWig {$1} {$1}.bw"

I suppose I am trying to identify or nail-down a potential gap between traditional workflows and the flexibility researchers often need for quick, ad hoc job submission.

nleroy917 avatar Nov 26 '24 00:11 nleroy917

I guess the conditions for this to be useful would be:

  1. Extremely small PEP (one sample attribute)
  2. Extremely simple pipeline (bash or python one liner)
  3. Benefits from parallelization

nleroy917 avatar Nov 26 '24 02:11 nleroy917

@nleroy917 this is a good idea. IIRC, way back in time, @nsheff had an example or two like this which sort of "pushed the limits" "/ thought outside the box" (if I'm permitted some clichés) of looper in this way, maybe he has already a working example or something closest to this which would represent a good starting point?

vreuter avatar Nov 26 '24 10:11 vreuter

From infrastructure on December 3rd, 2024:

Theres two things to solve:

  1. What to do with the command template? Maybe using -y to give it a command template (command-extra-override) is a way to provide a command template when there was none to begin with?
  2. Can we make a PEP on the fly given some way of info? Sure... we can make it accept stdin and then what I wrote would work...?

nleroy917 avatar Dec 03 '24 18:12 nleroy917

Just putting here for reference, I went down the rabbit hole slightly more and it is possible to parallelize natively using bash; just use xargs:

ls *.bdg | xargs -n 1 -P $(nproc) -I {} bash -c 'bigtools bedGraphToBigWig "{}" "{}.bw"'

Only works when $(nproc) returns a value greater than one of course... so you still would need to allocate some cores for yourself. Its an interesting stop-gap, but I still think the looper version proposed above would be way better.

nleroy917 avatar Dec 04 '24 15:12 nleroy917