euphoria
euphoria copied to clipboard
Add parallelism control
After removing explicit partitioning, we have currently no explicit control over the parallelism of executing operators. This affects both batch and stream. There must be a way to give a hint to the translator that certain operation should be parallelized more or less than the input. Options are:
- add a method to set parallelism of operator on executor - e.g.
Executor executor = ...; executor.withParallelism("OPERATOR_NAME", 100).submit(flow);
- add downstream parallelism hint to shuffle operators, e.g.
ReduceByKey.of(...) .keyBy(...) .... .withHint(Parallelism.of(100));
- some other option?
I think we should never set explicit parallelism, instead we should hint operator with the percentual estimate of increase / decrease in data size, so we can decide parallelism based on the input data.