datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning)

Open alamb opened this issue 1 year ago • 0 comments

Note it has code from https://github.com/apache/datafusion/pull/10410 so that might good to review first

Which issue does this PR close?

Closes https://github.com/apache/datafusion/issues/10210

Part of https://github.com/apache/arrow-datafusion/issues/9637 -- let's make DataFusion planning faster by not copying so much

Rationale for this change

Now that we have the nice TreeNode API thanks to #8913 and @peter-toth let's use it to both simplify the code and avoid copies

What changes are included in this PR?

  1. rewrite TypeCoercion via TreeNode API
  2. Introduce LogicalPlan::recompute_schema to recompute the schema after expressions in a plan are changed

Are these changes tested?

Existing CI

Are there any user-facing changes?

Faster planning:

  1. 12% faster TPCH planning
  2. 10% faster TPCDS planning
Details

group                                         main                                   type_coercion
-----                                         ----                                   -------------
logical_aggregate_with_join                   1.00  1221.8±12.51µs        ? ?/sec    1.00  1220.4±11.13µs        ? ?/sec
logical_plan_tpcds_all                        1.00    160.3±1.91ms        ? ?/sec    1.00    159.5±1.81ms        ? ?/sec
logical_plan_tpch_all                         1.02     17.4±0.22ms        ? ?/sec    1.00     17.0±0.20ms        ? ?/sec
logical_select_all_from_1000                  1.00     18.7±0.10ms        ? ?/sec    1.01     18.9±0.17ms        ? ?/sec
logical_select_one_from_700                   1.00   808.2±23.16µs        ? ?/sec    1.01   817.7±10.21µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00    763.6±7.80µs        ? ?/sec    1.00   763.1±13.86µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.00   748.3±10.69µs        ? ?/sec    1.00   747.1±15.11µs        ? ?/sec
physical_plan_tpcds_all                       1.12  1509.7±10.59ms        ? ?/sec    1.00   1350.0±8.12ms        ? ?/sec
physical_plan_tpch_all                        1.10    102.2±1.80ms        ? ?/sec    1.00     93.1±1.49ms        ? ?/sec
physical_plan_tpch_q1                         1.09      5.7±0.07ms        ? ?/sec    1.00      5.2±0.06ms        ? ?/sec
physical_plan_tpch_q10                        1.06      4.8±0.10ms        ? ?/sec    1.00      4.5±0.05ms        ? ?/sec
physical_plan_tpch_q11                        1.05      4.2±0.10ms        ? ?/sec    1.00      4.0±0.08ms        ? ?/sec
physical_plan_tpch_q12                        1.07      3.4±0.06ms        ? ?/sec    1.00      3.2±0.06ms        ? ?/sec
physical_plan_tpch_q13                        1.04      2.3±0.06ms        ? ?/sec    1.00      2.2±0.05ms        ? ?/sec
physical_plan_tpch_q14                        1.03      3.0±0.05ms        ? ?/sec    1.00      2.9±0.06ms        ? ?/sec
physical_plan_tpch_q16                        1.06      4.1±0.06ms        ? ?/sec    1.00      3.9±0.07ms        ? ?/sec
physical_plan_tpch_q17                        1.04      3.8±0.05ms        ? ?/sec    1.00      3.7±0.05ms        ? ?/sec
physical_plan_tpch_q18                        1.07      4.3±0.06ms        ? ?/sec    1.00      4.0±0.06ms        ? ?/sec
physical_plan_tpch_q19                        1.28      8.1±0.11ms        ? ?/sec    1.00      6.3±0.08ms        ? ?/sec
physical_plan_tpch_q2                         1.12      8.8±0.08ms        ? ?/sec    1.00      7.9±0.08ms        ? ?/sec
physical_plan_tpch_q20                        1.10      5.1±0.09ms        ? ?/sec    1.00      4.6±0.09ms        ? ?/sec
physical_plan_tpch_q21                        1.10      6.9±0.10ms        ? ?/sec    1.00      6.3±0.07ms        ? ?/sec
physical_plan_tpch_q22                        1.08      3.8±0.09ms        ? ?/sec    1.00      3.5±0.09ms        ? ?/sec
physical_plan_tpch_q3                         1.04      3.4±0.06ms        ? ?/sec    1.00      3.3±0.06ms        ? ?/sec
physical_plan_tpch_q4                         1.06      2.5±0.05ms        ? ?/sec    1.00      2.3±0.04ms        ? ?/sec
physical_plan_tpch_q5                         1.12      5.1±0.09ms        ? ?/sec    1.00      4.6±0.09ms        ? ?/sec
physical_plan_tpch_q6                         1.13  1863.3±56.21µs        ? ?/sec    1.00  1643.3±80.88µs        ? ?/sec
physical_plan_tpch_q7                         1.13      6.5±0.11ms        ? ?/sec    1.00      5.8±0.11ms        ? ?/sec
physical_plan_tpch_q8                         1.11      8.4±0.10ms        ? ?/sec    1.00      7.6±0.06ms        ? ?/sec
physical_plan_tpch_q9                         1.09      6.3±0.09ms        ? ?/sec    1.00      5.8±0.06ms        ? ?/sec
physical_select_all_from_1000                 1.45     88.7±0.40ms        ? ?/sec    1.00     61.3±0.45ms        ? ?/sec
physical_select_one_from_700                  1.05      3.9±0.05ms        ? ?/sec    1.00      3.7±0.05ms        ? ?/sec

alamb avatar May 02 '24 19:05 alamb