datafusion
datafusion copied to clipboard
[Epic] A collection of issues to improve planning performance / speed / efficiency
This is a collection of tickets related to making DataFusion's planning speed faster. Planning speed is the time from a SQL string being created to when the ExecutionPlan
is created
- [x] https://github.com/apache/arrow-datafusion/issues/8638
- [x] https://github.com/apache/arrow-datafusion/issues/4680
- [ ] https://github.com/apache/arrow-datafusion/issues/5309
- [ ] https://github.com/apache/arrow-datafusion/issues/4628
- [x] https://github.com/apache/arrow-datafusion/issues/3892
- [x] https://github.com/apache/arrow-datafusion/pull/5256
- [x] https://github.com/apache/arrow-datafusion/issues/7522
- [ ] https://github.com/apache/arrow-datafusion/issues/7698
- [x] https://github.com/apache/arrow-datafusion/pull/7942
- [ ] https://github.com/apache/arrow-datafusion/issues/9144
- [ ] https://github.com/apache/arrow-datafusion/issues/9577
- [ ] https://github.com/apache/arrow-datafusion/issues/9873
- [ ] https://github.com/apache/arrow-datafusion/issues/9637
- [x] https://github.com/apache/datafusion/issues/5157
- [x] https://github.com/apache/datafusion/issues/10426
Also I'd like to consider replace list in DFSchema by case_insensitive_hashmap or something similar in order to get value with O(1) complexity instead of O(N).
As I understand, now complexity is O(N^2) due two loops of iterations (datafusion_common::dfschema::DFSchema::index_of_column_by_name
and datafusion_common::table_reference::TableReference::resolved_eq
)
Yes, I think there is a lot of room for improvement (though we need to be careful about taking on crate dependencies that might not have a good long term maintenance story)
Here are some other recent discussions about how to improve planning speed:
- https://github.com/apache/arrow-datafusion/issues/9577
- https://github.com/apache/arrow-datafusion/issues/9637
An update here. Thanks to a bunch of work by @haohuaijin @matthewmturner @jayzhan211 @peter-toth @jackwener and myself, the planning speed on 38.0.0 is looking to be quite a bit better 20%-700% better in many cases. I am fairly confident there is still another factor of 2 to be had by completing https://github.com/apache/arrow-datafusion/issues/9637, which I expect to complete over the next few weeks
+ critcmp main 37.0.0
group 37.0.0 main
----- ------ ----
logical_aggregate_with_join 1.05 1271.9±10.23µs ? ?/sec 1.00 1210.1±16.14µs ? ?/sec
logical_plan_tpcds_all 1.07 167.2±1.37ms ? ?/sec 1.00 156.4±0.88ms ? ?/sec
logical_plan_tpch_all 1.01 17.2±0.18ms ? ?/sec 1.00 17.0±0.15ms ? ?/sec
logical_select_all_from_1000 4.84 93.5±0.41ms ? ?/sec 1.00 19.3±0.10ms ? ?/sec
logical_select_one_from_700 1.00 751.6±12.41µs ? ?/sec 1.06 795.9±8.14µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.06 795.8±10.91µs ? ?/sec 1.00 750.1±8.38µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.04 764.2±18.21µs ? ?/sec 1.00 737.4±18.35µs ? ?/sec
physical_plan_tpcds_all 1.46 2.2±0.01s ? ?/sec 1.00 1479.1±3.64ms ? ?/sec
physical_plan_tpch_all 1.35 134.5±0.81ms ? ?/sec 1.00 99.6±0.77ms ? ?/sec
physical_plan_tpch_q1 1.43 7.7±0.06ms ? ?/sec 1.00 5.4±0.07ms ? ?/sec
physical_plan_tpch_q10 1.38 6.4±0.05ms ? ?/sec 1.00 4.6±0.02ms ? ?/sec
physical_plan_tpch_q11 1.24 5.1±0.03ms ? ?/sec 1.00 4.1±0.03ms ? ?/sec
physical_plan_tpch_q12 1.25 4.1±0.02ms ? ?/sec 1.00 3.3±0.01ms ? ?/sec
physical_plan_tpch_q13 1.22 2.7±0.02ms ? ?/sec 1.00 2.2±0.01ms ? ?/sec
physical_plan_tpch_q14 1.22 3.5±0.02ms ? ?/sec 1.00 2.9±0.02ms ? ?/sec
physical_plan_tpch_q16 1.33 5.3±0.02ms ? ?/sec 1.00 4.0±0.02ms ? ?/sec
physical_plan_tpch_q17 1.29 4.9±0.03ms ? ?/sec 1.00 3.8±0.02ms ? ?/sec
physical_plan_tpch_q18 1.33 5.5±0.06ms ? ?/sec 1.00 4.1±0.02ms ? ?/sec
physical_plan_tpch_q19 1.29 10.1±0.09ms ? ?/sec 1.00 7.9±0.05ms ? ?/sec
physical_plan_tpch_q2 1.44 12.3±0.09ms ? ?/sec 1.00 8.5±0.06ms ? ?/sec
physical_plan_tpch_q20 1.32 6.4±0.05ms ? ?/sec 1.00 4.9±0.02ms ? ?/sec
physical_plan_tpch_q21 1.41 9.5±0.03ms ? ?/sec 1.00 6.8±0.06ms ? ?/sec
physical_plan_tpch_q22 1.29 4.7±0.03ms ? ?/sec 1.00 3.6±0.03ms ? ?/sec
physical_plan_tpch_q3 1.27 4.2±0.03ms ? ?/sec 1.00 3.3±0.02ms ? ?/sec
physical_plan_tpch_q4 1.39 3.4±0.02ms ? ?/sec 1.00 2.4±0.02ms ? ?/sec
physical_plan_tpch_q5 1.27 6.1±0.06ms ? ?/sec 1.00 4.8±0.03ms ? ?/sec
physical_plan_tpch_q6 1.17 2.1±0.01ms ? ?/sec 1.00 1752.6±12.06µs ? ?/sec
physical_plan_tpch_q7 1.39 8.7±0.08ms ? ?/sec 1.00 6.2±0.03ms ? ?/sec
physical_plan_tpch_q8 1.52 12.2±0.08ms ? ?/sec 1.00 8.0±0.03ms ? ?/sec
physical_plan_tpch_q9 1.53 9.2±0.06ms ? ?/sec 1.00 6.0±0.05ms ? ?/sec
physical_select_all_from_1000 7.42 683.8±1.12ms ? ?/sec 1.00 92.2±0.49ms ? ?/sec
physical_select_one_from_700 1.12 4.2±0.02ms ? ?/sec 1.00 3.7±0.04ms ? ?/sec
I compared 37.0.0
(with the tpcds benchmark) on this branch: https://github.com/alamb/arrow-datafusion/tree/alamb/37_bench
Comparison
```shell
set -x -e
## This script tests planning speed of 37.0.0 against the speed on planning on main
git fetch -p apache
git fetch -p alamb
# remove old test runs
rm -rf target/criterion/
# use a version of 37 with the tpcds benchmarks
BRANCH_NAME="37.0.0"
git checkout alamb/37_bench
git reset --hard alamb/alamb/37_bench
cargo update
cargo bench --bench sql_planner -- --save-baseline ${BRANCH_NAME}
echo "** Comparing to main"
git checkout main
git reset --hard apache/main
cargo update
cargo bench --bench sql_planner -- --save-baseline main
critcmp main ${BRANCH_NAME}
@alamb Hi, amazing work have been done! It's became much more speedy.
But it seems that the complexity of algorithms is still O(n^2) Here we have graph avg query execution time over the number of columns:
I agree there are still places that are N^2 in the number of columns.
With @haohuaijin 's great work in https://github.com/apache/datafusion/pull/9595 I think adding an index (perhaps computed on demand) to DFSchema
might be more tractable to do without causing performance regressions for smaller column counts.
It would be great if someone wanted to give that a try
We recently updated to the latest Datafusion and we've seen our planning time go from ~20ms to ~10ms! Great job on this.
We recently updated to the latest Datafusion and we've seen our planning time go from ~20ms to ~10ms! Great job on this.
That is great to hear --- thanks for the report @matthewmturner
BTW I think there is still significant improvement to be had by completing https://github.com/apache/datafusion/issues/9637. I don't think we'll get it all done by 38.0.0 but I think we'll improve it some more
Current progress
group 37.0.0 main
----- ------ ----
logical_aggregate_with_join 1.07 1281.7±14.79µs ? ?/sec 1.00 1198.9±14.54µs ? ?/sec
logical_plan_tpcds_all 1.09 172.4±1.92ms ? ?/sec 1.00 157.5±1.68ms ? ?/sec
logical_plan_tpch_all 1.06 17.7±0.24ms ? ?/sec 1.00 16.7±0.18ms ? ?/sec
logical_select_all_from_1000 5.17 96.2±0.48ms ? ?/sec 1.00 18.6±0.14ms ? ?/sec
logical_select_one_from_700 1.00 739.2±8.71µs ? ?/sec 1.09 809.4±35.23µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.06 797.4±34.31µs ? ?/sec 1.00 750.7±8.05µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.02 752.9±6.78µs ? ?/sec 1.00 738.4±9.07µs ? ?/sec
physical_plan_tpcds_all 1.65 2.2±0.01s ? ?/sec 1.00 1340.8±8.03ms ? ?/sec
physical_plan_tpch_all 1.54 139.9±1.25ms ? ?/sec 1.00 90.8±1.36ms ? ?/sec
physical_plan_tpch_q1 1.59 8.2±0.06ms ? ?/sec 1.00 5.1±0.07ms ? ?/sec
physical_plan_tpch_q10 1.53 6.6±0.06ms ? ?/sec 1.00 4.3±0.09ms ? ?/sec
physical_plan_tpch_q11 1.36 5.3±0.10ms ? ?/sec 1.00 3.9±0.06ms ? ?/sec
physical_plan_tpch_q12 1.42 4.3±0.10ms ? ?/sec 1.00 3.0±0.05ms ? ?/sec
physical_plan_tpch_q13 1.33 2.8±0.04ms ? ?/sec 1.00 2.1±0.02ms ? ?/sec
physical_plan_tpch_q14 1.35 3.7±0.07ms ? ?/sec 1.00 2.7±0.04ms ? ?/sec
physical_plan_tpch_q16 1.46 5.5±0.09ms ? ?/sec 1.00 3.7±0.07ms ? ?/sec
physical_plan_tpch_q17 1.46 5.1±0.08ms ? ?/sec 1.00 3.5±0.05ms ? ?/sec
physical_plan_tpch_q18 1.44 5.7±0.09ms ? ?/sec 1.00 3.9±0.09ms ? ?/sec
physical_plan_tpch_q19 1.65 10.3±0.09ms ? ?/sec 1.00 6.3±0.09ms ? ?/sec
physical_plan_tpch_q2 1.62 12.6±0.10ms ? ?/sec 1.00 7.8±0.11ms ? ?/sec
physical_plan_tpch_q20 1.46 6.7±0.09ms ? ?/sec 1.00 4.6±0.08ms ? ?/sec
physical_plan_tpch_q21 1.58 9.9±0.09ms ? ?/sec 1.00 6.2±0.09ms ? ?/sec
physical_plan_tpch_q22 1.47 5.0±0.07ms ? ?/sec 1.00 3.4±0.07ms ? ?/sec
physical_plan_tpch_q3 1.40 4.4±0.06ms ? ?/sec 1.00 3.1±0.05ms ? ?/sec
physical_plan_tpch_q4 1.50 3.5±0.06ms ? ?/sec 1.00 2.3±0.06ms ? ?/sec
physical_plan_tpch_q5 1.43 6.4±0.10ms ? ?/sec 1.00 4.4±0.08ms ? ?/sec
physical_plan_tpch_q6 1.37 2.1±0.05ms ? ?/sec 1.00 1572.0±16.95µs ? ?/sec
physical_plan_tpch_q7 1.60 8.9±0.11ms ? ?/sec 1.00 5.6±0.09ms ? ?/sec
physical_plan_tpch_q8 1.72 12.5±0.09ms ? ?/sec 1.00 7.3±0.11ms ? ?/sec
physical_plan_tpch_q9 1.71 9.5±0.10ms ? ?/sec 1.00 5.6±0.10ms ? ?/sec
physical_select_all_from_1000 11.52 701.4±1.22ms ? ?/sec 1.00 60.9±0.32ms ? ?/sec
physical_select_one_from_700 1.17 4.2±0.06ms ? ?/sec 1.00 3.6±0.03ms ? ?/sec
50% faster for tpcds and tpch planning
physical_plan_tpcds_all 1.65 2.2±0.01s ? ?/sec 1.00 1340.8±8.03ms ? ?/sec
physical_plan_tpch_all 1.54 139.9±1.25ms ? ?/sec 1.00 90.8±1.36ms ? ?/sec
Note I expect another 30-40% combined savings between https://github.com/apache/datafusion/pull/10356 and https://github.com/apache/datafusion/issues/10209 and https://github.com/apache/datafusion/issues/9873
Here is where we currently stand with planning performance compared to 37 and 38
Highlight: TPC-DS 76% faster planning, TPCH 64% faster
group 37.0.0 38.0.0 main
----- ------ ------ ----
physical_plan_tpcds_all 1.76 2.2±0.01s ? ?/sec 1.06 1322.5±10.01ms ? ?/sec 1.00 1253.2±10.31ms ? ?/sec
physical_plan_tpch_all 1.64 140.6±2.49ms ? ?/sec 1.04 89.5±1.43ms ? ?/sec 1.00 85.8±1.53ms ? ?/sec
Highlight: SELECT * ..
with 1000 columns is 11x faster
group 37.0.0 38.0.0 main
----- ------ ------ ----
physical_select_all_from_1000 11.37 689.9±1.24ms ? ?/sec 1.00 60.7±0.29ms ? ?/sec 1.01 61.4±0.34ms ? ?/sec
physical_select_one_from_700 1.17 4.1±0.04ms ? ?/sec 1.00 3.5±0.03ms ? ?/sec 1.01 3.6±0.06ms ? ?/sec
++ critcmp main 38.0.0 37.0.0
group 37.0.0 38.0.0 main
----- ------ ------ ----
logical_aggregate_with_join 1.27 1275.1±14.29µs ? ?/sec 1.22 1219.9±21.06µs ? ?/sec 1.00 1003.7±14.68µs ? ?/sec
logical_plan_tpcds_all 1.14 171.6±2.25ms ? ?/sec 1.05 157.8±1.74ms ? ?/sec 1.00 150.9±1.77ms ? ?/sec
logical_plan_tpch_all 1.08 17.9±0.35ms ? ?/sec 1.02 17.0±0.17ms ? ?/sec 1.00 16.6±0.18ms ? ?/sec
logical_select_all_from_1000 5.05 94.5±0.86ms ? ?/sec 1.00 18.7±0.10ms ? ?/sec 1.01 18.9±0.16ms ? ?/sec
logical_select_one_from_700 1.00 750.1±11.67µs ? ?/sec 1.08 810.9±32.07µs ? ?/sec 1.08 811.4±16.40µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.05 793.2±10.89µs ? ?/sec 1.01 762.2±8.23µs ? ?/sec 1.00 756.0±12.71µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.03 762.6±12.18µs ? ?/sec 1.00 741.5±10.89µs ? ?/sec 1.00 740.0±7.52µs ? ?/sec
physical_plan_tpcds_all 1.76 2.2±0.01s ? ?/sec 1.06 1322.5±10.01ms ? ?/sec 1.00 1253.2±10.31ms ? ?/sec
physical_plan_tpch_all 1.64 140.6±2.49ms ? ?/sec 1.04 89.5±1.43ms ? ?/sec 1.00 85.8±1.53ms ? ?/sec
physical_plan_tpch_q1 1.81 8.2±0.08ms ? ?/sec 1.10 5.0±0.13ms ? ?/sec 1.00 4.5±0.06ms ? ?/sec
physical_plan_tpch_q10 1.65 6.7±0.09ms ? ?/sec 1.07 4.3±0.06ms ? ?/sec 1.00 4.0±0.05ms ? ?/sec
physical_plan_tpch_q11 1.49 5.3±0.11ms ? ?/sec 1.07 3.8±0.08ms ? ?/sec 1.00 3.5±0.07ms ? ?/sec
physical_plan_tpch_q12 1.59 4.3±0.07ms ? ?/sec 1.12 3.0±0.05ms ? ?/sec 1.00 2.7±0.06ms ? ?/sec
physical_plan_tpch_q13 1.39 2.8±0.04ms ? ?/sec 1.02 2.1±0.03ms ? ?/sec 1.00 2.0±0.04ms ? ?/sec
physical_plan_tpch_q14 1.51 3.6±0.06ms ? ?/sec 1.12 2.7±0.08ms ? ?/sec 1.00 2.4±0.04ms ? ?/sec
physical_plan_tpch_q16 1.63 5.5±0.07ms ? ?/sec 1.09 3.7±0.13ms ? ?/sec 1.00 3.4±0.05ms ? ?/sec
physical_plan_tpch_q17 1.57 5.1±0.07ms ? ?/sec 1.07 3.5±0.10ms ? ?/sec 1.00 3.2±0.05ms ? ?/sec
physical_plan_tpch_q18 1.56 5.7±0.10ms ? ?/sec 1.06 3.9±0.12ms ? ?/sec 1.00 3.7±0.06ms ? ?/sec
physical_plan_tpch_q19 1.93 10.6±0.09ms ? ?/sec 1.11 6.1±0.14ms ? ?/sec 1.00 5.5±0.10ms ? ?/sec
physical_plan_tpch_q2 1.71 12.8±0.07ms ? ?/sec 1.03 7.7±0.15ms ? ?/sec 1.00 7.5±0.12ms ? ?/sec
physical_plan_tpch_q20 1.61 6.9±0.13ms ? ?/sec 1.02 4.4±0.07ms ? ?/sec 1.00 4.3±0.12ms ? ?/sec
physical_plan_tpch_q21 1.69 10.0±0.16ms ? ?/sec 1.03 6.1±0.12ms ? ?/sec 1.00 5.9±0.09ms ? ?/sec
physical_plan_tpch_q22 1.56 4.9±0.13ms ? ?/sec 1.04 3.3±0.06ms ? ?/sec 1.00 3.1±0.05ms ? ?/sec
physical_plan_tpch_q3 1.49 4.4±0.10ms ? ?/sec 1.07 3.2±0.11ms ? ?/sec 1.00 3.0±0.04ms ? ?/sec
physical_plan_tpch_q4 1.58 3.5±0.05ms ? ?/sec 1.05 2.3±0.07ms ? ?/sec 1.00 2.2±0.03ms ? ?/sec
physical_plan_tpch_q5 1.49 6.3±0.10ms ? ?/sec 1.05 4.4±0.09ms ? ?/sec 1.00 4.2±0.07ms ? ?/sec
physical_plan_tpch_q6 1.48 2.1±0.05ms ? ?/sec 1.08 1553.5±30.77µs ? ?/sec 1.00 1435.2±12.73µs ? ?/sec
physical_plan_tpch_q7 1.71 9.1±0.08ms ? ?/sec 1.05 5.6±0.08ms ? ?/sec 1.00 5.3±0.07ms ? ?/sec
physical_plan_tpch_q8 1.84 12.7±0.17ms ? ?/sec 1.05 7.3±0.12ms ? ?/sec 1.00 6.9±0.11ms ? ?/sec
physical_plan_tpch_q9 1.86 9.7±0.08ms ? ?/sec 1.04 5.4±0.10ms ? ?/sec 1.00 5.2±0.08ms ? ?/sec
physical_select_all_from_1000 11.37 689.9±1.24ms ? ?/sec 1.00 60.7±0.29ms ? ?/sec 1.01 61.4±0.34ms ? ?/sec
physical_select_one_from_700 1.17 4.1±0.04ms ? ?/sec 1.00 3.5±0.03ms ? ?/sec 1.01 3.6±0.06ms ? ?/sec
Test script
Details
set -x -e
## This script tests planning speed of 37.0.0 against the speed on planning on main
git fetch -p apache
git fetch -p alamb
# remove old test runs
rm -rf target/criterion/
# Compare version 38
git checkout 38.0.0
cargo update
cargo bench --bench sql_planner -- --save-baseline "38.0.0"
# use a version of 37 with the tpcds benchmarks
git checkout alamb/37_bench
git reset --hard alamb/alamb/37_bench
cargo update
cargo bench --bench sql_planner -- --save-baseline "37.0.0"
echo "** Comparing to main"
git checkout main
git reset --hard apache/main
cargo update
cargo bench --bench sql_planner -- --save-baseline main
critcmp main "38.0.0" "37.0.0"