mars icon indicating copy to clipboard operation
mars copied to clipboard

[Discussion] Enhance evaluation module to facilitate JIT

Open wjsi opened this issue 3 years ago • 3 comments

Mars implements DataFrame.eval and collects operands (at https://github.com/mars-project/mars/blob/master/mars/optimization/logical/tileable/arithmetic_query.py) that can fit into a string expression applicable for pandas eval. While improving efficiency, this implementation has drawbacks.

  1. When meeting non-string column index type, for instance, MultiIndex, eval is not supported.
  2. Non-arithmetic chunk-by-chunk operands not well supported.
  3. Tensor fusion not supported.

To handle these issues, current implementation of optimization need to be enhanced. Instead of passing expression strings, an expression DAG with fused expressions need to be added.

We may use Mars Expression DAG itself to represent those evaluation DAGs. An evaluation expression starts with a Fetch node accepting the chunk itself and outputs one or more chunks as results. All acceptable operations inside the evaluation DAG must be chunk-by-chunk operands.

After generating evaluation DAGs, related operands are then condensed into a Evaluate operand with a evaluation_dag operand. When submitted to a supervisor, cluster-based optimization can also be made before tiling. After tiling this operand, a chunk-to-chunk plan is generated and then executed.

wjsi avatar Mar 16 '22 08:03 wjsi

Look reasonable to me, but now I will implement a jit version for the similar situation like arithmetic_query.py, to see whether jit can be faster than numexpr and what APIs or utility functions jit tool still lacks of...

dlee992 avatar Mar 17 '22 07:03 dlee992

what is tensor fusion? Can u give an example?

dlee992 avatar Mar 21 '22 03:03 dlee992

what is tensor fusion? Can u give an example?

Chain of chunk-wise tensor operands

wjsi avatar Mar 22 '22 12:03 wjsi