looper Running a looper pipeline ad hoc

trafficstars

I wonder if there are either 1) solutions for this or 2) easy ways to add the ability to run a looper pipeline in an ad hoc manner. What I mean by that is this: occasionally, the overhead of a traditional workflow can be a bit daunting, but I really enjoy the ease of dispatching off jobs through slurm+looper.

I would love to replace traditional bash for loops with looper calls.

An example

I have a folder with hundreds of mixed-type files. Some of these might be bedGraph files. I want to convert these to .bw format. I can use bigtools bedGraphToBigWig. Traditionally, I might just use a for loop:

for file in *.bdg; do
  bigtools bedGraphToBigWig $file $file.bw
done;

But this takes awhile since it goes one-by-one, and there are hundreds. I'd love to fire them all off at once using looper and slurm:

ls *.bdg | looper run "bigtools bedGraphToBigWig {$1} {$1}.bw"

I suppose I am trying to identify or nail-down a potential gap between traditional workflows and the flexibility researchers often need for quick, ad hoc job submission.

Nov 26 '24 00:11 nleroy917

I guess the conditions for this to be useful would be:

Extremely small PEP (one sample attribute)
Extremely simple pipeline (bash or python one liner)
Benefits from parallelization

Nov 26 '24 02:11 nleroy917

@nleroy917 this is a good idea. IIRC, way back in time, @nsheff had an example or two like this which sort of "pushed the limits" "/ thought outside the box" (if I'm permitted some clichés) of looper in this way, maybe he has already a working example or something closest to this which would represent a good starting point?

Nov 26 '24 10:11 vreuter

From infrastructure on December 3rd, 2024:

Theres two things to solve:

What to do with the command template? Maybe using -y to give it a command template (command-extra-override) is a way to provide a command template when there was none to begin with?
Can we make a PEP on the fly given some way of info? Sure... we can make it accept stdin and then what I wrote would work...?

Dec 03 '24 18:12 nleroy917

Just putting here for reference, I went down the rabbit hole slightly more and it is possible to parallelize natively using bash; just use xargs:

ls *.bdg | xargs -n 1 -P $(nproc) -I {} bash -c 'bigtools bedGraphToBigWig "{}" "{}.bw"'

Only works when $(nproc) returns a value greater than one of course... so you still would need to allocate some cores for yourself. Its an interesting stop-gap, but I still think the looper version proposed above would be way better.

Dec 04 '24 15:12 nleroy917

looper looper copied to clipboard

Running a looper pipeline ad hoc

An example

Theres two things to solve:

looper
looper copied to clipboard