nextflow
nextflow copied to clipboard
Add map input/output type for processes
New feature
I would like to be able to produce a single channel with multiple named values (ie a Map
) from my processes, ie my_proccess.out.view()
should return:
[a:abc, b:123, c:false]
[a:abc, b:123, c:false]
[a:def, b:456, c:true]
Currently we have a tuple
qualifier, which creates an output channel which is a tuple of other types. However this tuple is unlabelled, so users have to extract values from this channel by position, which results in confusing code.
It is also possible to produce multiple output channels, each of which has its own name. However, these channels can't easily be combined into a single channel containing maps or tuples, because the merge
operator has been deprecated, and in general joining channels by position is discouraged.
Usage scenario
This would be useful when a user is working with mostly map data in their channels, likely because they want each field to be labelled instead of unlabelled as in a tuple.
Suggest implementation
I would envisage a new map
qualifier, which is used like this:
process hmmer_search {
container "quay.io/biocontainers/hmmer:3.3.2--h1b792b2_1"
input:
path profile
path database
output:
map [table: path('table.txt'), human_readable: path('match.txt')]
script:
"""
hmmsearch -o match.txt --tblout table.txt ${profile} ${database}
"""
}
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
+1
Quick workaround is to output a tuple and follow up with a map operator that converts each tuple to a map.
This issue is complementary to #2257, which is about multiple named channels whereas this issue is about named values within a channel (e.g. map). Ideally both use cases should be supported.
A single map channel would be used for 1-to-1 relationships whereas multiple named channels would be used for 1-to-many and many-to-many relationships.
+1
Or maybe immutable named tuples? E.g. output: tuple table: path('table.txt'), human_readable: path('match.txt')
Then all current tuple-reliant functionality (e.g. groupKey) works as before, but one can use names instead of indices when manipulating process results, e.g.
input_ch | MY_PROC | map { it.table }
instead of
| map { it[0] }