flowcraft
flowcraft copied to clipboard
Merging syntax with new operator
At the moment, Flowcraft does not provide syntax in the pipeline string to define merge of outputs from multiple components into a single component . To allow that, a new operator should be added and be classified as the merge operator.
I propose the following sintax:
( (A | B ) > C ) | D ) > E
,
where the outputs of A & B
would be given as input for C
and the outputs of C & D
would be passed as input for E
.
These modifications also require setting up the total number of accepted inputs on the merging components, instead of only accepting one main input.
Does the existence of the >
merge operator imply that the same expression without >
is also meaningful? ((A | B) C) | D) E
. I believe in another issue a user asked for this expression to be equivalent to ACE | BCE | DE
. Is that also the implied intention of this proposal?
@sjackman the idea would be to have other checks that forbide that string without the >
operator. We already have some checks for malformed strings, so it is 'just' a matter of adding more and/or editing the existing sanity tests.
The other issue with repeating processes rather than writing duplicated processes in different forks will be handled apart from this, since it is simpler to implement. Although of course the design options are linked.
How about this syntax:
((A + B) C) + D) E
That indicates to me that A and B are the inputs to C. It would also allow for this:
(short | long | short + long) unicycler (bandage | quast)
in a single command to run a short-read assembly, long-read assembly, and hybrid assembly.
(short | long | short + long) unicycler (bandage | quast)
@sjackman I'm not completely sure how this example would work but I'm trying to wrap my head around it because we need to consider the changes required for parsing the string and how to transform that into something readable by the engine.
You proposal seems cleaner. But there is a problem with the (A | A + B ) C
syntax:
- Component C will receive data from both
A
andA + B
. However, component C (like all components in flowcraft) are agnostic about the preceding and following components. Which means that it will have no way to differentiate between thesample_ids
that come from onlyA
and the ones that come fromA + B
. - In your example you expect three different assemblies, but in this case only one will be published by the
C
component.
Btw, that example could also be replicated with the syntax suggested by @bfrgoncalves:
assembly="unicycler (bandage | quast)"
(short $assembly | long $assembly | (short | long) > $assembly )
It's more verbose, but also seems more explicit in separating the different assemblies (or whatever components we use after the merge).
I'm not discarding your proposal, just discussing the pros and cons of these approaches.
In my proposal (A | B) C
is equivalent to A B | A C
, so (short | long | short + long) unicycler
is equivalent to…
short unicycler
long unicycler
(short + long) unicycler
Does that answer your question?
Hmm, in that case the parser would repeat the C
component for each lane, under the hood, is that it?
Yes, that's right.