datafusion
datafusion copied to clipboard
Cache output `equivalence_properties` in ProjectionExec
Which issue does this PR close?
Closes #9084.
Rationale for this change
Avoid potential exponential explosions in branching calls to output_partitioning
and equivalence_properties
in certain plan combinations.
What changes are included in this PR?
Simply store the equivalence_properties
, which are always calculated anyway in ProjectionExec
.
Are these changes tested?
They were tested on the example from https://github.com/apache/arrow-datafusion/issues/9084
Are there any user-facing changes?
Not really, apart from not stalling TPC-DS q64.
Out of curiousity I ran the sql_planner benchmark on this PR compared to c843226d84f79fa7ce99576ae8605ecef1a28484 (main at the time of writing).
Even though the PR is not targetted at those queries, the improvements seem to be in the range of no-change to -3% :+1:
Results
Running benches/sql_planner.rs (/home/svs/code/arrow-datafusion/target/release-nonlto/deps/sql_planner-89232c907fea5c25)
Gnuplot not found, using plotters backend
Benchmarking logical_select_one_from_700: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60.
logical_select_one_from_700
time: [1.3148 ms 1.3211 ms 1.3283 ms]
change: [-1.2182% -0.5419% +0.1004%] (p = 0.11 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
physical_select_one_from_700
time: [4.5290 ms 4.5467 ms 4.5678 ms]
change: [-0.7308% -0.2853% +0.2090%] (p = 0.27 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
Benchmarking logical_trivial_join_low_numbered_columns: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
logical_trivial_join_low_numbered_columns
time: [1.3374 ms 1.3413 ms 1.3454 ms]
change: [-1.0865% -0.5004% +0.0116%] (p = 0.06 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking logical_trivial_join_high_numbered_columns: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 50.
logical_trivial_join_high_numbered_columns
time: [1.3660 ms 1.3684 ms 1.3712 ms]
change: [-1.5172% -1.0961% -0.6794%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
Benchmarking logical_aggregate_with_join: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
logical_aggregate_with_join
time: [1.7815 ms 1.7837 ms 1.7859 ms]
change: [-2.1258% -1.5940% -1.1366%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q1 time: [1.3576 ms 1.3596 ms 1.3617 ms]
change: [-1.0661% -0.4282% +0.3850%] (p = 0.26 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
Benchmarking physical_plan_tpch_q2: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q2 time: [1.3940 ms 1.3958 ms 1.3975 ms]
change: [-1.1803% -0.8095% -0.4489%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low severe
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q3: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q3 time: [1.2505 ms 1.2523 ms 1.2541 ms]
change: [-1.5359% -1.0619% -0.6131%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q4: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q4 time: [1.2593 ms 1.2613 ms 1.2632 ms]
change: [-1.2966% -0.9442% -0.6166%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
8 (8.00%) high mild
Benchmarking physical_plan_tpch_q5: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q5 time: [1.2976 ms 1.2992 ms 1.3009 ms]
change: [-1.2683% -0.8299% -0.3415%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
Benchmarking physical_plan_tpch_q6: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q6 time: [1.1260 ms 1.1282 ms 1.1304 ms]
change: [-0.8453% -0.4227% -0.0237%] (p = 0.06 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q7: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q7 time: [1.4897 ms 1.4930 ms 1.4976 ms]
change: [-1.1966% -0.5377% +0.1408%] (p = 0.13 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
5 (5.00%) high mild
6 (6.00%) high severe
Benchmarking physical_plan_tpch_q8: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q8 time: [1.5160 ms 1.5185 ms 1.5217 ms]
change: [-0.7303% -0.3073% +0.2294%] (p = 0.22 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
Benchmarking physical_plan_tpch_q9: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.1s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q9 time: [1.4044 ms 1.4062 ms 1.4081 ms]
change: [-2.2499% -1.5619% -0.9930%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
6 (6.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q10: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q10 time: [1.3326 ms 1.3346 ms 1.3368 ms]
change: [-1.5355% -1.1162% -0.6136%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low severe
1 (1.00%) low mild
7 (7.00%) high mild
6 (6.00%) high severe
Benchmarking physical_plan_tpch_q11: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q11 time: [1.2844 ms 1.2861 ms 1.2880 ms]
change: [-1.0891% -0.7848% -0.4596%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low mild
3 (3.00%) high mild
1 (1.00%) high severe
Benchmarking physical_plan_tpch_q12: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q12 time: [1.2914 ms 1.2932 ms 1.2953 ms]
change: [-1.3540% -0.8637% -0.3279%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
5 (5.00%) high mild
5 (5.00%) high severe
Benchmarking physical_plan_tpch_q13: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q13 time: [1.1891 ms 1.1914 ms 1.1937 ms]
change: [-1.2510% -0.8514% -0.3725%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q14: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q14 time: [1.2280 ms 1.2300 ms 1.2322 ms]
change: [-0.8543% -0.3613% +0.2262%] (p = 0.18 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
6 (6.00%) high mild
4 (4.00%) high severe
Benchmarking physical_plan_tpch_q16: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q16 time: [1.2386 ms 1.2406 ms 1.2427 ms]
change: [-1.6075% -1.0106% -0.4775%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q17: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q17 time: [1.2031 ms 1.2054 ms 1.2078 ms]
change: [-1.6412% -1.2875% -0.9239%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking physical_plan_tpch_q18: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q18 time: [1.2832 ms 1.2851 ms 1.2871 ms]
change: [-1.2041% -0.8586% -0.5098%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
Benchmarking physical_plan_tpch_q19: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q19 time: [1.3537 ms 1.3567 ms 1.3602 ms]
change: [-4.1553% -2.8145% -1.6060%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
Benchmarking physical_plan_tpch_q20: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
physical_plan_tpch_q20 time: [1.2995 ms 1.3014 ms 1.3034 ms]
change: [-0.6861% -0.3842% -0.0702%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
Benchmarking physical_plan_tpch_q21: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q21 time: [1.5220 ms 1.5242 ms 1.5265 ms]
change: [-4.0173% -3.0381% -2.1649%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking physical_plan_tpch_q22: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, enable flat sampling, or reduce sample count to 50.
physical_plan_tpch_q22 time: [1.3675 ms 1.3692 ms 1.3711 ms]
change: [-1.9314% -1.3749% -0.8150%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
physical_plan_tpch_all time: [27.848 ms 27.893 ms 27.941 ms]
change: [-0.7545% -0.5233% -0.2729%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Thanks!
I wonder if we should add a similar optimization for other nodes where computing this is non trivial (like HashJoinExec or HashAggregateExec or FilterExec)
Yup, this makes sense to me. In fact AggregateExec
also always computes the eq_props when created and then throws it away (thus forcing re-computing it later on).
I could imagine even changing the signature of ExecutionPlan::equivalence_properties to return a reference and basically force pre-computing the results 🤔
This also sounds good. Arguably for other nodes this might be forcing an eager computation of something that might not be used in theory, but in practice I think these two methods always get invoked during the planning process, at least to propagate the info up the plan tree to something that does cache it now (might also need some profiling/investigating here too).
I can try and make these additional changes separately if other people think it could be a good idea. I also think breaking up DefaultPhysicalPlanner::create_initial_plan
s LogicalPlan::Join
case would be beneficial, since I think that's what's causing the stack overflow for the tpcds_physical_q64
test (which would then also cover the present issue).
@alamb let's wait on merging this. We want to think a little about this and have ideas for a different general solution. We should have something next week.
@alamb let's wait on merging this. We want to think a little about this and have ideas for a different general solution. We should have something next week.
This is fine with me if it is ok with @gruuya. Also given how small this PR is, if he needs the change sooner maybe we can consider merging it so it is available while the more general solution is underway. I don't feel strongly about this.
@ozankabak / @berkaysynnada is there a ticket somewhere that describes what you are doing ? It might help to make the plans more visible to others as well as I would like to point out we have a version of proejction pushdown in InfluxDB https://github.com/influxdata/influxdb/blob/main/iox_query/src/physical_optimizer/projection_pushdown.rs that might be worth a look as well
@ozankabak / @berkaysynnada is there a ticket somewhere that describes what you are doing ?
We will open the PR to the upstream next week. I will also request API-level suggestions there. It's not expected to significantly alter our current plans, but it will be a solid step towards optimizing potential outcomes following other existing optimizations and future ones. Thanks for the IOx solution, I will examine it.
@gruuya, let us know if it is very urgent for you. Then we can go ahead and merge for now and change again in a few days. Otherwise, we can collaborate next week on the more general approach we are working on.
@ozankabak / @berkaysynnada is there a ticket somewhere that describes what you are doing ?
We will open a ticket and a draft PR next week as an RFC for this
We will open the PR to the upstream next week
Just to be clear, I was suggesting a ticket Before the PR. Some reasons a ticket prior to PR might be be beneficial are:
- It may avoid potentially unecessary work like this PR as others would know what is coming
- Could provide an area to receive feeback before you code a particular approach up
- Would give people a high level idea of what is coming, and thus making reviews faster
BTW, for the sake clarity, there are two steps of the work here:
- Finding a general solution to expensive calls such as
equivalence_properties
, whose PR will come next week when @mustafasrepo comes back. - Improving the projection pushdown code (whose issue @berkaysynnada will open shortly and will create a PR for next week)
These two are loosely related but independent tasks
@alamb, @ozankabak this is not that urgent for us (though it would certainly be nice to have a fix soon).
My impression from https://github.com/apache/arrow-datafusion/issues/9084#issuecomment-1921137878 was that the general solution considered would actually be complementary to the one in this PR (i.e. locally caching stuff in the optimizer as opposed to caching it at the node level), hence possibly not necessitating changing it later on.
Otherwise, we can wait for the more general solution, thanks.
Just to be clear, I was suggesting a ticket Before the PR. Some reasons a ticket prior to PR might be be beneficial are:
I thought it would be easier to discuss through code, but you are right in the importance of filling a ticket. I just mean that this PR has nothing to do with the projection optimizer I'm currently working on.
@alamb let's wait on merging this. We want to think a little about this and have ideas for a different general solution. We should have something next week.
"a different general solution" is not the projection optimizer 😁
I have opened the issue as you suggest: https://github.com/apache/arrow-datafusion/issues/9111.
Converting to draft as we wait for additional potential improvements so it is clear this PR is not waiting on feedback
Update: In a couple days, we will open a PR with a solution that takes the core idea of this and generalizes it in a neat way (we think). Stay tuned!
Superseded by #9346