euphoria
euphoria copied to clipboard
euphoria-local: Allowing persist of intermediate dataset
I tried to directly persist intermediate dataset. But the sink is empty:
Dataset<T> data = ... // non-empty
data.persist(sink);
Dataset<T> FlatMap.of(data)...
It seems that indirect persist
works as expected:
FlatMap.of(data)...output()
.persist(sink);`
Is it possible to either fix it or throw some exception at least?
https://github.com/seznam/euphoria/blob/3fe6816ca3af95abf5d881f43a0e3ae92e45f1af/euphoria-local/src/test/java/cz/seznam/euphoria/executor/local/LocalExecutorTest.java#L628
Thanks for the report. Does the provided test fail?
If so, can you please create branch with this failing test?
Yes, the test fails.
Branch: https://github.com/seznam/euphoria/tree/tnovak/persist-test
Thanks, we'll look into this ;)
There is fundamental flaw in translation of Flow into runnable DAG. This flaw will be solved by #256, until then I suggest (a slightly suboptimal and highly ugly) workaround:
- the problem is that no
Dataset
can have both output sink and be consumed by another operator - there is no problem for a single
Dataset
to be consumed by multiple operators - therefore, the workaround is to add a (dummy) mapping operation between output and the intermediate dataset:
Dataset<T> data = ... // non-empty
MapElements.of(data).using(e -> e).output().persist(sink);
Dataset<T> FlatMap.of(data)...
We will focus on correct solution (#256), but until then, this seems to be the only way out.