euphoria
euphoria copied to clipboard
Hints should not be runtime specific and should describe data only
The current implementation of hints is not portable among runtimes. We should move hints to euphoria-core and they should describe dataset, not the operator implementation.
Eg. JoinHint.BroadcastHashJoin
should become something like SizeHint.FITS_IN_MEMORY
Every operator will have on output
method parameter Hint
to give information to the runner to help optimize execution for the next transformation
public Dataset<OUT> output(Hint... hints)
E.g.:
Dataset<T> smallDataset = Filter.named("filter to small data")
.of(bigDataset)
.by( //filter )
.output(SizeHint.FITS_IN_MEMORY);
In case we want to give more information about transformation (not about dataset), there will be builder method
described(String name, Hint... hints)
FlatMap.described("extract-something", Hint.CPU_EXPENSIVE)
.of(dataset)
.using( //extracting)
.output();
Use cases where it will be useful:
- skew join (one side is small and can fit in memory AKA Broadcast Hash Join )
- estimate parallelization level for transformation