datafusion
datafusion copied to clipboard
Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning)
Note it has code from https://github.com/apache/datafusion/pull/10410 so that might good to review first
Which issue does this PR close?
Closes https://github.com/apache/datafusion/issues/10210
Part of https://github.com/apache/arrow-datafusion/issues/9637 -- let's make DataFusion planning faster by not copying so much
Rationale for this change
Now that we have the nice TreeNode API thanks to #8913 and @peter-toth let's use it to both simplify the code and avoid copies
What changes are included in this PR?
- rewrite
TypeCoercionvia TreeNode API - Introduce
LogicalPlan::recompute_schemato recompute the schema after expressions in a plan are changed
Are these changes tested?
Existing CI
Are there any user-facing changes?
Faster planning:
- 12% faster TPCH planning
- 10% faster TPCDS planning
Details
group main type_coercion
----- ---- -------------
logical_aggregate_with_join 1.00 1221.8±12.51µs ? ?/sec 1.00 1220.4±11.13µs ? ?/sec
logical_plan_tpcds_all 1.00 160.3±1.91ms ? ?/sec 1.00 159.5±1.81ms ? ?/sec
logical_plan_tpch_all 1.02 17.4±0.22ms ? ?/sec 1.00 17.0±0.20ms ? ?/sec
logical_select_all_from_1000 1.00 18.7±0.10ms ? ?/sec 1.01 18.9±0.17ms ? ?/sec
logical_select_one_from_700 1.00 808.2±23.16µs ? ?/sec 1.01 817.7±10.21µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 763.6±7.80µs ? ?/sec 1.00 763.1±13.86µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.00 748.3±10.69µs ? ?/sec 1.00 747.1±15.11µs ? ?/sec
physical_plan_tpcds_all 1.12 1509.7±10.59ms ? ?/sec 1.00 1350.0±8.12ms ? ?/sec
physical_plan_tpch_all 1.10 102.2±1.80ms ? ?/sec 1.00 93.1±1.49ms ? ?/sec
physical_plan_tpch_q1 1.09 5.7±0.07ms ? ?/sec 1.00 5.2±0.06ms ? ?/sec
physical_plan_tpch_q10 1.06 4.8±0.10ms ? ?/sec 1.00 4.5±0.05ms ? ?/sec
physical_plan_tpch_q11 1.05 4.2±0.10ms ? ?/sec 1.00 4.0±0.08ms ? ?/sec
physical_plan_tpch_q12 1.07 3.4±0.06ms ? ?/sec 1.00 3.2±0.06ms ? ?/sec
physical_plan_tpch_q13 1.04 2.3±0.06ms ? ?/sec 1.00 2.2±0.05ms ? ?/sec
physical_plan_tpch_q14 1.03 3.0±0.05ms ? ?/sec 1.00 2.9±0.06ms ? ?/sec
physical_plan_tpch_q16 1.06 4.1±0.06ms ? ?/sec 1.00 3.9±0.07ms ? ?/sec
physical_plan_tpch_q17 1.04 3.8±0.05ms ? ?/sec 1.00 3.7±0.05ms ? ?/sec
physical_plan_tpch_q18 1.07 4.3±0.06ms ? ?/sec 1.00 4.0±0.06ms ? ?/sec
physical_plan_tpch_q19 1.28 8.1±0.11ms ? ?/sec 1.00 6.3±0.08ms ? ?/sec
physical_plan_tpch_q2 1.12 8.8±0.08ms ? ?/sec 1.00 7.9±0.08ms ? ?/sec
physical_plan_tpch_q20 1.10 5.1±0.09ms ? ?/sec 1.00 4.6±0.09ms ? ?/sec
physical_plan_tpch_q21 1.10 6.9±0.10ms ? ?/sec 1.00 6.3±0.07ms ? ?/sec
physical_plan_tpch_q22 1.08 3.8±0.09ms ? ?/sec 1.00 3.5±0.09ms ? ?/sec
physical_plan_tpch_q3 1.04 3.4±0.06ms ? ?/sec 1.00 3.3±0.06ms ? ?/sec
physical_plan_tpch_q4 1.06 2.5±0.05ms ? ?/sec 1.00 2.3±0.04ms ? ?/sec
physical_plan_tpch_q5 1.12 5.1±0.09ms ? ?/sec 1.00 4.6±0.09ms ? ?/sec
physical_plan_tpch_q6 1.13 1863.3±56.21µs ? ?/sec 1.00 1643.3±80.88µs ? ?/sec
physical_plan_tpch_q7 1.13 6.5±0.11ms ? ?/sec 1.00 5.8±0.11ms ? ?/sec
physical_plan_tpch_q8 1.11 8.4±0.10ms ? ?/sec 1.00 7.6±0.06ms ? ?/sec
physical_plan_tpch_q9 1.09 6.3±0.09ms ? ?/sec 1.00 5.8±0.06ms ? ?/sec
physical_select_all_from_1000 1.45 88.7±0.40ms ? ?/sec 1.00 61.3±0.45ms ? ?/sec
physical_select_one_from_700 1.05 3.9±0.05ms ? ?/sec 1.00 3.7±0.05ms ? ?/sec