euphoria icon indicating copy to clipboard operation
euphoria copied to clipboard

Hints should not be runtime specific and should describe data only

Open dmvk opened this issue 7 years ago • 1 comments

The current implementation of hints is not portable among runtimes. We should move hints to euphoria-core and they should describe dataset, not the operator implementation.

Eg. JoinHint.BroadcastHashJoin should become something like SizeHint.FITS_IN_MEMORY

dmvk avatar Feb 05 '18 07:02 dmvk

Every operator will have on output method parameter Hint to give information to the runner to help optimize execution for the next transformation

public Dataset<OUT> output(Hint... hints)

E.g.:


Dataset<T> smallDataset = Filter.named("filter to small data")
  .of(bigDataset)
  .by( //filter )
  .output(SizeHint.FITS_IN_MEMORY);

In case we want to give more information about transformation (not about dataset), there will be builder method

described(String name, Hint... hints)
FlatMap.described("extract-something", Hint.CPU_EXPENSIVE)
.of(dataset)
.using( //extracting)
.output();

Use cases where it will be useful:

  • skew join (one side is small and can fit in memory AKA Broadcast Hash Join )
  • estimate parallelization level for transformation

mareksimunek avatar Feb 14 '18 11:02 mareksimunek