euphoria icon indicating copy to clipboard operation
euphoria copied to clipboard

euphoria-local: Allowing persist of intermediate dataset

Open t-novak opened this issue 7 years ago • 6 comments

I tried to directly persist intermediate dataset. But the sink is empty:

Dataset<T> data = ... // non-empty
data.persist(sink);
Dataset<T> FlatMap.of(data)...

It seems that indirect persist works as expected:

FlatMap.of(data)...output()
  .persist(sink);` 

Is it possible to either fix it or throw some exception at least?

t-novak avatar Jan 24 '18 14:01 t-novak

https://github.com/seznam/euphoria/blob/3fe6816ca3af95abf5d881f43a0e3ae92e45f1af/euphoria-local/src/test/java/cz/seznam/euphoria/executor/local/LocalExecutorTest.java#L628

t-novak avatar Jan 24 '18 14:01 t-novak

Thanks for the report. Does the provided test fail?

je-ik avatar Jan 24 '18 20:01 je-ik

If so, can you please create branch with this failing test?

je-ik avatar Jan 24 '18 20:01 je-ik

Yes, the test fails.

Branch: https://github.com/seznam/euphoria/tree/tnovak/persist-test

t-novak avatar Jan 25 '18 07:01 t-novak

Thanks, we'll look into this ;)

dmvk avatar Jan 25 '18 08:01 dmvk

There is fundamental flaw in translation of Flow into runnable DAG. This flaw will be solved by #256, until then I suggest (a slightly suboptimal and highly ugly) workaround:

  • the problem is that no Dataset can have both output sink and be consumed by another operator
  • there is no problem for a single Dataset to be consumed by multiple operators
  • therefore, the workaround is to add a (dummy) mapping operation between output and the intermediate dataset:
Dataset<T> data = ... // non-empty
MapElements.of(data).using(e -> e).output().persist(sink);
Dataset<T> FlatMap.of(data)...

We will focus on correct solution (#256), but until then, this seems to be the only way out.

je-ik avatar Jan 25 '18 10:01 je-ik