nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Feature Request: recursive process

Open lindenb opened this issue 5 years ago • 9 comments

New feature

as seen on twitter https://twitter.com/yokofakun/status/1248533372630155264 I'd like to have a way to describe a recursive process that would call itself until a condition is done.

I have no idea how it could be described in a DSL:

Usage scenario

  • I have a large list of VCF that I want to merge in a given region. The number of VCF is too large for bcftools merge or bcftools concat : it would take days and memory to load the indexes and merge the variants. The idea is to merge by divide an conquer the list of VCFs

  • same example for BAM

  • I want to find the rares variant in my family but absent from 10000 bams:

    • call family-sample1 vs call family-sample2, keep the common variant
    • call family-sample3 only for the variant of previous step, keep the commont variant
    • call family-sample4 only for the variant of previous step, keep the commont variant
    • call family-sample5 only for the variant of previous step, keep the commont variant
    • call control-bams1 remove the variants
    • call control-bams2 remove the variants
    • (...)
    • call control-bamsN remove the variants at the end we have the are variants
  • GATK CombineGVCF. 10,000 gvcfs to be combined:

    • Combine 100 Gvcf * 100
    • Combine 100 Gvcf
    • Genotype the last GVCFf

Suggest implementation

it's not clear to me how you could implement this idea :-)

recursive process  mergeVcf {
recursionPoolSize 10
stopRecursionWhenPoolSize 1
input:
      val vcfs from vcf_list.andThen(self.merged.collect())
output:
      file("merged${task.recursionLevel}.bcf") into merged      
script:
"""
bcftools merge -O b -o merged${task.recursionLevel}.bcf ${vcfs.join(" ")}
"""
}

lindenb avatar Apr 10 '20 10:04 lindenb

see also http://plindenbaum.blogspot.com/2014/12/divide-and-conquer-in-makefile.html (2014)

Divide-and-conquer in a #Makefile : recursivity and #parallelism.

https://pbs.twimg.com/media/B4GoInWIQAAuwp1.png:large

lindenb avatar Apr 10 '20 10:04 lindenb

Interesting. I think it could be done using the feedback pattern

pditommaso avatar Apr 10 '20 10:04 pditommaso

@pditommaso ohh, that's new to me !

lindenb avatar Apr 10 '20 11:04 lindenb

unless I'm wrong your solution would be sequential only ?

In the example you provided, the lines are added one after the other for each process . So, each step is not parallelizeable compared to a divide and conquer strategy (?)

Furthermore, in the scope of merging some BAM files, the main bam would become bigger and bigger and hence, the process slower and slower...

Am I wrong ?

lindenb avatar Apr 10 '20 13:04 lindenb

Umm, thinking more the collect makes things more complicated and like the loop above does not work because the collect would force to wait for the overall completion.

pditommaso avatar Apr 10 '20 14:04 pditommaso

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Sep 23 '20 14:09 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 22 '21 12:04 stale[bot]

Not sure it's the same feature it's envisioned in this issue, however, the support for recursion has been added recently https://github.com/nextflow-io/nextflow/discussions/2521

pditommaso avatar Dec 24 '21 15:12 pditommaso

@pditommaso thank you , I'll have a look at this new feature !

lindenb avatar Dec 24 '21 15:12 lindenb