nextflow
nextflow copied to clipboard
Optional inputs for DSL2
New feature
Pinging @rsuchecki @pditommaso (I couldn't find an issue with this, I hope it's okay that I open a new one).
Based on a small conversation on the Gitter (1 - primary | 2), there's interest (a lot from me) to have more direct support for optional inputs - this seems is inline with the goals of DSL2 to produce reusable tool modules / interfaces.
Other workflow specifications have the concept of tool wrappers, which aim to be a "write once, use in all of your workflows". This means the tool wrapper would contain most (if not all) available configuration options, which then the command line is dynamically constructed. This allows the community to build and contribute high quality tool wrappers, for example: Common Workflow Library (CWLibrary#fastqc), BioWDL (BioWDL#fastqc) with the tools available for other users to use, or upload to stores like Dockstore or the Galaxy toolshed.
Projects like aCLImatise aim to generate tool wrappers, as this process is usually a significant time consuming aspect of building workflows.
The DSL2 makes good strides towards this, and a stronger concept for optional inputs would take this further.
Relevant discussion:
Command line construction sidenote
I think it would be a bad idea to create a new syntax for building or interpolating command lines, but tool developers could use the groovy environment to build strings for each command option.
Usage scenario
Consider fastqc (eg: nf-core module definition), which might have the (simplified) command structure:
fastqc \
[-c contaminant file] \
[ ... other config options ] \
seqfile1 .. seqfileN
I could build a process definition to encapsulate these ways to optionally configure the tool.
This process definition is just hypothetical, just one way I could think to do it.
process FASTQC {
input:
tuple val(name),
Optional[path(contaminant)],
path(reads)
output:
path("*.zip"), emit: zip
script:
contaminant_script = (contaminant != null) ? "--contaminant ${contaminant}" : ""
reads_script = reads.join(' ')
"""
fastqc \
${contaminant_script} \
${reads_script}
"""
}
But usage of imported modules in DSL2 in a workflow requires positional arguments, so you would have something like:
include { FASTQC as fastqc } from './tools/fastqc'
workflow {
fastqc(params.name, null, params.reads)
}
Suggest implementation
As @rsuchecki noted in gitter:
Things are very flexible for val inputs, but understandably get more complex when files/paths are involved as they need to be staged. Tuples are nice and keep things organised but are still an extension of the same idea of positional inputs.
I'd hope to avoid the use of positional arguments, because you can't ascertain context for a variable.
There are also some tools that can have multiple types of input files (actually any combination of those inputs). As such, none of them are mandatory, but you need at least one. For instance, if we look at read assemblers such as megahit, you can do either:
# Case 1: paired-end reads
megahit -1 sample1_R1.fastq.gz,sample2_R1.fastq.gz -2 sample1_R2.fastq.gz,sample2_R2.fastq.gz
# Case 2: paired-end, interleaved reads
megahit --12 sample1.fastq.gz,sample2.fastq.gz
# Case 3: single-end reads
megahit -r reads_single.fastq.gz
# Case 4: multiple input types combined
megahit -1 sample1_paired_R1.fastq.gz,sample2_paired_R1.fastq.gz \
-2 sample1_paired_R2.fastq.gz,sample2_paired_R2.fastq.gz \
-r sample1_unpaired.fastq.gz,sample2_unpaired.fastq.gz
# And more...
Lately I had trouble handling this case with the DSL2 syntax in a clean way.
I managed to find a solution (not as clean as I would have hoped). https://github.com/nf-core/sarek/blob/a7679b9b5c178351b1e96a3ffe7ee81ddf9aad06/main.nf#L226
Which I later use in a clean manner in a process: https://github.com/nf-core/sarek/blob/dsl2/modules/nf-core/software/qualimap_bamqc.nf
Yep, this would be really nice. Using NO_FILE
as suggested here doesn't work for optional inputs on AWS as @apeltzer found.
Another solution is to have a dummy file in the pipeline repo that you can stage if the actual file isn't required in the process e.g. initiated here and used here.
This also means you won't have to write anything to the results directory as suggested by @MaxUlysse.
It looks like there are a couple of common workarounds :
- Files without a value (so pure optional inputs) - placeholder file
- Ability to pass some set of configuration options - seems a few people use
val(meta)
But maybe also recognising a few common patterns of arguments which tools may require to better wrap a "tool interface":
- Mutually exclusive sets of arguments.
- accepted values (range or set of values)
Just nudging @rsuchecki and @pditommaso to see if you guys have any thoughts.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I would like to see such a feature. @drpatelh did you find any hack to make it work?
I haven't explored some of the workarounds listed above, but I also agree that implementing some form of optional input syntax for DSL2 would be very useful.
I haven't I'm afraid. I have resorted to staging "dummy" files to bypass this. See discussion here. Maybe there is a better solution.
Not ideal, but another workaround to use an optional input without having to stage a dummy file is to pass an empty list as the input path.
This script worked on aws batch:
nextflow.enable.dsl=2
process CAT_FILES {
input:
path files_to_cat // list of paths
path optional // optional file
output:
path 'out.txt'
script:
def args = ['cat']
files_to_cat.each { args.add(it) }
if (optional) args.add(optional[0]) // or optional.each { args.add(it) }
args.add("> out.txt")
args.join(' ')
}
workflow {
CAT_FILES(['file1.txt', 'file2.txt'], [])
}
An optional path
is just a list of path
with size 1 or 0.
Wanting to bump this - having clear syntax for optional inputs would be really helpful.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bump
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Related https://github.com/nextflow-io/nextflow/pull/2710
Coming back to bump again ;)
I just encoutered this in kallisto quant module and had to change the module's main.nf (which I'd rather avoid). totally support this issue!