metafacture-core icon indicating copy to clipboard operation
metafacture-core copied to clipboard

Modules blocking on `closeStream()` may not forward their data in scripts with wormholes

Open cboehme opened this issue 12 years ago • 2 comments

The behaviour you observed is a bug indeed, @ah641054:

Using wormholes users can combine data from multiple flows:

Flow A -> @wormhole;
Flow B -> @wormhole;
@wormhole -> Flow C;

Flux first executes flow A and then flow B. Flow C is executed indirectly by receiving data from A and B. Once execution of A and B is finished, closeStream() is called on both flows. After having called closeStream() on flow A, all modules in A and C are closed. Calling closeStream() on flow B then only closes the remaining modules in B. Modules in C ignore the second call.

This behaviour is all well until a module in flow B attempts to send data prior to forwarding the closeStream() event (as count-triples does, for instance). This data will be send to the already closed modules in flow C.

Possible solutions for this problem are:

  • @mgeipel suggested to extend the Lifecycle interface with a flush() method which is called befor closeStream().
  • @cboehme suggested to add a wormhole module which acts as a barrier for closeStream() calls and forwards them only once each incoming flow has been closed.

cboehme avatar Jul 05 '13 15:07 cboehme

We requires significant framework changes. For the time being a quick fix should be implemented (in version 1.1)

cboehme avatar Jul 26 '13 13:07 cboehme

Provided a quick fix in 49f24df by adding wait-for-inputs(N) to Flux.

example

data1 | open-file | as-lines | @Y;
data2 | open-file | as-lines | @Y;

@Y|
wait-for-inputs("2")|
write(out);

See also https://github.com/culturegraph/metafacture-core/tree/master/examples/beacon/create

mgeipel avatar Jul 30 '13 08:07 mgeipel