datafusion
datafusion copied to clipboard
Extract CSE logic to `datafusion_common`
Which issue does this PR close?
Part of https://github.com/apache/datafusion/issues/12599.
Rationale for this change
This PR moves common subexpression elimination logic to datafusion_common to be able to share it with physical expressions.
What changes are included in this PR?
This PR:
- Moves CSE logic to
datafusion_common::cse. The logic is splited toCSEstruct, that does the elimination of common subtrees andCSEControllertrait, that can be implemented for differentTreeNodetypes. The PR contains 2 implementations.ExprCSEControllercan be used to eliminateExprcommon subtrees, andTestTreeNodeCSEControlleris used withTestTreeNodes in tests. A newPhysicalExprCSEControllerwill be added in a follow-up PR to eliminate commondyn PhysicalExprsubtrees. - Extracts the
HashNodetrait to be able to implement it fordyn PhysicalExprin a folllow-up PR. - Makes
TestTreeNodecrate public for CSE tests.
Are these changes tested?
Yes, with existing but refactored tests.
Are there any user-facing changes?
No.
cc @alamb
FYI @andygrove -- CSE on physical plans is happening
I ran the benchmarks and there is no change in performance 🚀
++ critcmp main extract-cse-logic
group extract-cse-logic main
----- ----------------- ----
logical_aggregate_with_join 1.01 1420.8±40.68µs ? ?/sec 1.00 1408.6±14.31µs ? ?/sec
logical_plan_tpcds_all 1.00 194.6±1.41ms ? ?/sec 1.00 194.1±2.83ms ? ?/sec
logical_plan_tpch_all 1.00 24.0±0.24ms ? ?/sec 1.01 24.1±0.30ms ? ?/sec
logical_select_all_from_1000 1.01 5.4±0.06ms ? ?/sec 1.00 5.3±0.05ms ? ?/sec
logical_select_one_from_700 1.00 1130.2±22.28µs ? ?/sec 1.00 1127.6±13.80µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 1091.3±34.33µs ? ?/sec 1.00 1094.4±27.40µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.01 1073.2±28.06µs ? ?/sec 1.00 1065.6±17.13µs ? ?/sec
physical_plan_tpcds_all 1.01 1237.2±21.72ms ? ?/sec 1.00 1228.5±13.18ms ? ?/sec
physical_plan_tpch_all 1.00 80.3±0.50ms ? ?/sec 1.00 80.5±0.72ms ? ?/sec
physical_plan_tpch_q1 1.02 3.0±0.05ms ? ?/sec 1.00 2.9±0.02ms ? ?/sec
physical_plan_tpch_q10 1.03 4.2±0.10ms ? ?/sec 1.00 4.1±0.09ms ? ?/sec
physical_plan_tpch_q11 1.02 3.5±0.07ms ? ?/sec 1.00 3.5±0.05ms ? ?/sec
physical_plan_tpch_q12 1.01 2.8±0.04ms ? ?/sec 1.00 2.8±0.02ms ? ?/sec
physical_plan_tpch_q13 1.00 2.2±0.04ms ? ?/sec 1.02 2.3±0.04ms ? ?/sec
physical_plan_tpch_q14 1.00 2.5±0.02ms ? ?/sec 1.02 2.6±0.06ms ? ?/sec
physical_plan_tpch_q16 1.00 3.5±0.08ms ? ?/sec 1.00 3.6±0.07ms ? ?/sec
physical_plan_tpch_q17 1.00 3.2±0.03ms ? ?/sec 1.01 3.2±0.04ms ? ?/sec
physical_plan_tpch_q18 1.02 3.8±0.08ms ? ?/sec 1.00 3.7±0.03ms ? ?/sec
physical_plan_tpch_q19 1.00 5.1±0.07ms ? ?/sec 1.00 5.1±0.05ms ? ?/sec
physical_plan_tpch_q2 1.01 6.7±0.14ms ? ?/sec 1.00 6.7±0.04ms ? ?/sec
physical_plan_tpch_q20 1.00 4.1±0.04ms ? ?/sec 1.00 4.1±0.04ms ? ?/sec
physical_plan_tpch_q21 1.00 5.3±0.03ms ? ?/sec 1.00 5.4±0.03ms ? ?/sec
physical_plan_tpch_q22 1.00 3.1±0.03ms ? ?/sec 1.04 3.3±0.10ms ? ?/sec
physical_plan_tpch_q3 1.01 3.1±0.07ms ? ?/sec 1.00 3.0±0.05ms ? ?/sec
physical_plan_tpch_q4 1.00 2.3±0.03ms ? ?/sec 1.00 2.3±0.05ms ? ?/sec
physical_plan_tpch_q5 1.00 4.1±0.07ms ? ?/sec 1.01 4.2±0.09ms ? ?/sec
physical_plan_tpch_q6 1.01 1685.3±43.27µs ? ?/sec 1.00 1667.0±27.51µs ? ?/sec
physical_plan_tpch_q7 1.00 5.2±0.04ms ? ?/sec 1.02 5.3±0.13ms ? ?/sec
physical_plan_tpch_q8 1.02 6.4±0.17ms ? ?/sec 1.00 6.3±0.12ms ? ?/sec
physical_plan_tpch_q9 1.04 5.1±0.15ms ? ?/sec 1.00 4.9±0.03ms ? ?/sec
physical_select_all_from_1000 1.01 40.6±0.19ms ? ?/sec 1.00 40.1±0.89ms ? ?/sec
physical_select_one_from_700 1.00 3.6±0.03ms ? ?/sec 1.00 3.6±0.03ms ? ?/sec
Thanks @alamb for the benchmark, I will address your comments tomorrow in this PR.
Thank you @peter-toth
I do think some of the suggestions I had for comments would help this code, but I also think we could do it as a follow on PR
I've fixed a few things in https://github.com/apache/datafusion/pull/13002/commits/1cae61cffb296fd7bd546e6d3f687f85496ab591 and addressed your comments.
Thanks for the review @alamb!