datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Extract CSE logic to `datafusion_common`

Open peter-toth opened this issue 1 year ago • 5 comments

Which issue does this PR close?

Part of https://github.com/apache/datafusion/issues/12599.

Rationale for this change

This PR moves common subexpression elimination logic to datafusion_common to be able to share it with physical expressions.

What changes are included in this PR?

This PR:

  • Moves CSE logic to datafusion_common::cse. The logic is splited to CSE struct, that does the elimination of common subtrees and CSEController trait, that can be implemented for different TreeNode types. The PR contains 2 implementations.ExprCSEController can be used to eliminate Expr common subtrees, and TestTreeNodeCSEController is used with TestTreeNodes in tests. A new PhysicalExprCSEController will be added in a follow-up PR to eliminate common dyn PhysicalExpr subtrees.
  • Extracts the HashNode trait to be able to implement it for dyn PhysicalExpr in a folllow-up PR.
  • Makes TestTreeNode crate public for CSE tests.

Are these changes tested?

Yes, with existing but refactored tests.

Are there any user-facing changes?

No.

peter-toth avatar Oct 18 '24 13:10 peter-toth

cc @alamb

peter-toth avatar Oct 18 '24 13:10 peter-toth

FYI @andygrove -- CSE on physical plans is happening

alamb avatar Oct 18 '24 19:10 alamb

I ran the benchmarks and there is no change in performance 🚀

++ critcmp main extract-cse-logic
group                                         extract-cse-logic                      main
-----                                         -----------------                      ----
logical_aggregate_with_join                   1.01  1420.8±40.68µs        ? ?/sec    1.00  1408.6±14.31µs        ? ?/sec
logical_plan_tpcds_all                        1.00    194.6±1.41ms        ? ?/sec    1.00    194.1±2.83ms        ? ?/sec
logical_plan_tpch_all                         1.00     24.0±0.24ms        ? ?/sec    1.01     24.1±0.30ms        ? ?/sec
logical_select_all_from_1000                  1.01      5.4±0.06ms        ? ?/sec    1.00      5.3±0.05ms        ? ?/sec
logical_select_one_from_700                   1.00  1130.2±22.28µs        ? ?/sec    1.00  1127.6±13.80µs        ? ?/sec
logical_trivial_join_high_numbered_columns    1.00  1091.3±34.33µs        ? ?/sec    1.00  1094.4±27.40µs        ? ?/sec
logical_trivial_join_low_numbered_columns     1.01  1073.2±28.06µs        ? ?/sec    1.00  1065.6±17.13µs        ? ?/sec
physical_plan_tpcds_all                       1.01  1237.2±21.72ms        ? ?/sec    1.00  1228.5±13.18ms        ? ?/sec
physical_plan_tpch_all                        1.00     80.3±0.50ms        ? ?/sec    1.00     80.5±0.72ms        ? ?/sec
physical_plan_tpch_q1                         1.02      3.0±0.05ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
physical_plan_tpch_q10                        1.03      4.2±0.10ms        ? ?/sec    1.00      4.1±0.09ms        ? ?/sec
physical_plan_tpch_q11                        1.02      3.5±0.07ms        ? ?/sec    1.00      3.5±0.05ms        ? ?/sec
physical_plan_tpch_q12                        1.01      2.8±0.04ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
physical_plan_tpch_q13                        1.00      2.2±0.04ms        ? ?/sec    1.02      2.3±0.04ms        ? ?/sec
physical_plan_tpch_q14                        1.00      2.5±0.02ms        ? ?/sec    1.02      2.6±0.06ms        ? ?/sec
physical_plan_tpch_q16                        1.00      3.5±0.08ms        ? ?/sec    1.00      3.6±0.07ms        ? ?/sec
physical_plan_tpch_q17                        1.00      3.2±0.03ms        ? ?/sec    1.01      3.2±0.04ms        ? ?/sec
physical_plan_tpch_q18                        1.02      3.8±0.08ms        ? ?/sec    1.00      3.7±0.03ms        ? ?/sec
physical_plan_tpch_q19                        1.00      5.1±0.07ms        ? ?/sec    1.00      5.1±0.05ms        ? ?/sec
physical_plan_tpch_q2                         1.01      6.7±0.14ms        ? ?/sec    1.00      6.7±0.04ms        ? ?/sec
physical_plan_tpch_q20                        1.00      4.1±0.04ms        ? ?/sec    1.00      4.1±0.04ms        ? ?/sec
physical_plan_tpch_q21                        1.00      5.3±0.03ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
physical_plan_tpch_q22                        1.00      3.1±0.03ms        ? ?/sec    1.04      3.3±0.10ms        ? ?/sec
physical_plan_tpch_q3                         1.01      3.1±0.07ms        ? ?/sec    1.00      3.0±0.05ms        ? ?/sec
physical_plan_tpch_q4                         1.00      2.3±0.03ms        ? ?/sec    1.00      2.3±0.05ms        ? ?/sec
physical_plan_tpch_q5                         1.00      4.1±0.07ms        ? ?/sec    1.01      4.2±0.09ms        ? ?/sec
physical_plan_tpch_q6                         1.01  1685.3±43.27µs        ? ?/sec    1.00  1667.0±27.51µs        ? ?/sec
physical_plan_tpch_q7                         1.00      5.2±0.04ms        ? ?/sec    1.02      5.3±0.13ms        ? ?/sec
physical_plan_tpch_q8                         1.02      6.4±0.17ms        ? ?/sec    1.00      6.3±0.12ms        ? ?/sec
physical_plan_tpch_q9                         1.04      5.1±0.15ms        ? ?/sec    1.00      4.9±0.03ms        ? ?/sec
physical_select_all_from_1000                 1.01     40.6±0.19ms        ? ?/sec    1.00     40.1±0.89ms        ? ?/sec
physical_select_one_from_700                  1.00      3.6±0.03ms        ? ?/sec    1.00      3.6±0.03ms        ? ?/sec

alamb avatar Oct 20 '24 13:10 alamb

Thanks @alamb for the benchmark, I will address your comments tomorrow in this PR.

peter-toth avatar Oct 20 '24 13:10 peter-toth

Thank you @peter-toth

I do think some of the suggestions I had for comments would help this code, but I also think we could do it as a follow on PR

I've fixed a few things in https://github.com/apache/datafusion/pull/13002/commits/1cae61cffb296fd7bd546e6d3f687f85496ab591 and addressed your comments.

peter-toth avatar Oct 20 '24 18:10 peter-toth

Thanks for the review @alamb!

peter-toth avatar Oct 22 '24 08:10 peter-toth