nextflow
nextflow copied to clipboard
`filterMap` operator
New feature: filterMap
operator
Nextflow currently supports both the filter
and map
operators against a queue channel. It is very common to combine these, so it may be useful -- as a convenience -- to have a single filterMap
operator that does both operations.
Usage scenario
Before:
myChannel
| filter { someCriteria(it) }
| map { someMap(it) }
| // etc...
After:
myChannel
| filterMap { someCriteria(it) ? someMap(it) : null }
| // etc...
The above may not look like much of an improvement, but bear in mind that it's maximally generalised. Maybe an outer join would be a more realistic example:
// Channel of the keys in myChannelA that are not in myChannelB
// NOTE For sake of the example:
// * myChannelA emits [ meta ]
// * myChannelB emits [ meta, etc ] and is *not* empty
myChannelA
| join(myChannelB, remainder: true)
| filterMap { meta, etc -> etc ? null : meta }
Suggest implementation
I've modelled this on Rust's std::iter::Iterator::filter_map
, in which its closure returns an Option<T>
: if it's Some<T>
, then that's the matched and mapped value (of type T
); if it's None
, then the filter skips. Groovy/Java (AFAIK) doesn't have an equivalent of Option<T>
, so I've gone with not-null
and null
, respectively (presuming it's unrealistic for the mapping function to return null
).
Groovy/Java (AFAIK) doesn't have an equivalent of
Option<T>
I stand corrected: https://docs.oracle.com/javase/8/docs/api/java/util/Optional.html
I was just thinking about this the other day... I think I would support a filterMap
operator using the Optional
instead of null
.
By the way, the branch
operator is essentially a multi-filter-map 😆
myChannelA
| join(myChannelB, remainder: true)
| branch { meta, etc ->
some: etc != null
return meta
}.some
Clearly the filerMap
is less verbose
The only thing is, as I've been investigating how to evolve the Nextflow language, I feel that operators are over-used, because they are often needed to fill gaps in the language. So I want to find ways to fill those gaps and make operators less necessary first, before we keep adding convenience operators and potentially further enable bad patterns. For example, I think we can make it so that branch
and multiMap
are not needed compared to filter
and map
.
That being said, I've also found filter_map
to be useful in Rust, and it seems like a valuable enough convenience even if we manage to simplify the library and usage of operators. I definitely prefer it over branch
Some weekend thoughts, you can also achieve a filterMap
with flatMap
:
myChannel
| flatMap { v -> someCriteria(v) ? [ someMap(v) ] : [] }