nextflow
nextflow copied to clipboard
Feature Request: recursive process
New feature
as seen on twitter https://twitter.com/yokofakun/status/1248533372630155264 I'd like to have a way to describe a recursive process that would call itself until a condition is done.
I have no idea how it could be described in a DSL:
Usage scenario
-
I have a large list of VCF that I want to merge in a given region. The number of VCF is too large for
bcftools mergeorbcftools concat: it would take days and memory to load the indexes and merge the variants. The idea is to merge by divide an conquer the list of VCFs -
same example for BAM
-
I want to find the rares variant in my family but absent from 10000 bams:
- call family-sample1 vs call family-sample2, keep the common variant
- call family-sample3 only for the variant of previous step, keep the commont variant
- call family-sample4 only for the variant of previous step, keep the commont variant
- call family-sample5 only for the variant of previous step, keep the commont variant
- call control-bams1 remove the variants
- call control-bams2 remove the variants
- (...)
- call control-bamsN remove the variants at the end we have the are variants
-
GATK CombineGVCF. 10,000 gvcfs to be combined:
- Combine 100 Gvcf * 100
- Combine 100 Gvcf
- Genotype the last GVCFf
Suggest implementation
it's not clear to me how you could implement this idea :-)
recursive process mergeVcf {
recursionPoolSize 10
stopRecursionWhenPoolSize 1
input:
val vcfs from vcf_list.andThen(self.merged.collect())
output:
file("merged${task.recursionLevel}.bcf") into merged
script:
"""
bcftools merge -O b -o merged${task.recursionLevel}.bcf ${vcfs.join(" ")}
"""
}
see also http://plindenbaum.blogspot.com/2014/12/divide-and-conquer-in-makefile.html (2014)
Divide-and-conquer in a #Makefile : recursivity and #parallelism.

Interesting. I think it could be done using the feedback pattern
@pditommaso ohh, that's new to me !
unless I'm wrong your solution would be sequential only ?
In the example you provided, the lines are added one after the other for each process . So, each step is not parallelizeable compared to a divide and conquer strategy (?)
Furthermore, in the scope of merging some BAM files, the main bam would become bigger and bigger and hence, the process slower and slower...
Am I wrong ?
Umm, thinking more the collect makes things more complicated and like the loop above does not work because the collect would force to wait for the overall completion.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Not sure it's the same feature it's envisioned in this issue, however, the support for recursion has been added recently https://github.com/nextflow-io/nextflow/discussions/2521
@pditommaso thank you , I'll have a look at this new feature !