datafusion
datafusion copied to clipboard
Better CSE identifier
This is a draft PR that implements the ideas from https://github.com/apache/datafusion/issues/10426#issuecomment-2105664520.
Which issue does this PR close?
Closes https://github.com/apache/datafusion/issues/10426.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
I think @peter-toth plans to break this PR up into smaller ones, so marking it as a draft to make it clear it isn't waiting on more feedback. If I am mistaken, please let me know
I think @peter-toth plans to break this PR up into smaller ones, so marking it as a draft to make it clear it isn't waiting on more feedback. If I am mistaken, please let me know
Yes, here is the first part that adds the new TreeNode APIs: https://github.com/apache/datafusion/pull/10543
@alamb, can you please help me with that MSRV failure? I don't know what could be the source of the failure and if I try running cargo msrv verify locally it fails with:
% cargo msrv verify
Fetching index
Unable to find key 'package.rust-version' (or 'package.metadata.msrv') in '/Users/ptoth/git/apache/arrow-datafusion/Cargo.toml'
This PR is more or less ready for review, tests are passing except for the MSRV. I focused only on the 3 performance improvements and deliberately kept the code as close to the original as possible to ease review. I plan to open a follow-up PR to tackle remaining TODOs and refactor CSE code a bit for better readability.
Local benchmarks show good perfroamance improvements, but @alamb please confirm it with your standard setup.
% critcmp main better-cse-identifier
group better-cse-identifier main
----- --------------------- ----
logical_aggregate_with_join 1.00 529.3±8.70µs ? ?/sec 1.00 528.4±6.17µs ? ?/sec
logical_plan_tpcds_all 1.00 71.3±2.74ms ? ?/sec 1.06 75.4±16.47ms ? ?/sec
logical_plan_tpch_all 1.00 7.3±0.19ms ? ?/sec 1.09 7.9±2.34ms ? ?/sec
logical_select_all_from_1000 1.02 15.9±2.00ms ? ?/sec 1.00 15.6±1.27ms ? ?/sec
logical_select_one_from_700 1.00 399.9±3.73µs ? ?/sec 1.02 408.3±4.29µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 422.7±92.70µs ? ?/sec 1.00 423.7±88.41µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.00 380.5±5.11µs ? ?/sec 1.01 385.3±4.22µs ? ?/sec
physical_plan_tpcds_all 1.00 536.8±11.44ms ? ?/sec 1.09 586.2±20.88ms ? ?/sec
physical_plan_tpch_all 1.01 38.5±12.88ms ? ?/sec 1.00 38.1±0.59ms ? ?/sec
physical_plan_tpch_q1 1.00 1108.3±12.50µs ? ?/sec 1.43 1585.0±12.90µs ? ?/sec
physical_plan_tpch_q10 1.00 1598.6±219.13µs ? ?/sec 1.15 1842.3±280.13µs ? ?/sec
physical_plan_tpch_q11 1.00 1339.7±14.58µs ? ?/sec 1.15 1540.2±247.46µs ? ?/sec
physical_plan_tpch_q12 1.00 1117.3±63.22µs ? ?/sec 1.12 1252.1±17.87µs ? ?/sec
physical_plan_tpch_q13 1.00 769.6±19.54µs ? ?/sec 1.12 864.0±12.96µs ? ?/sec
physical_plan_tpch_q14 1.00 930.6±216.63µs ? ?/sec 1.16 1081.1±125.78µs ? ?/sec
physical_plan_tpch_q16 1.00 1331.4±38.36µs ? ?/sec 1.15 1534.8±261.21µs ? ?/sec
physical_plan_tpch_q17 1.00 1209.8±29.88µs ? ?/sec 1.09 1322.0±17.01µs ? ?/sec
physical_plan_tpch_q18 1.00 1532.7±240.18µs ? ?/sec 1.04 1593.0±20.68µs ? ?/sec
physical_plan_tpch_q19 1.00 2.5±0.08ms ? ?/sec 1.06 2.7±0.04ms ? ?/sec
physical_plan_tpch_q2 1.00 3.2±0.70ms ? ?/sec 1.06 3.4±0.72ms ? ?/sec
physical_plan_tpch_q20 1.00 1642.3±16.61µs ? ?/sec 1.12 1845.0±302.12µs ? ?/sec
physical_plan_tpch_q21 1.00 2.6±0.65ms ? ?/sec 1.07 2.7±0.64ms ? ?/sec
physical_plan_tpch_q22 1.00 1196.6±101.79µs ? ?/sec 1.08 1292.8±15.01µs ? ?/sec
physical_plan_tpch_q3 1.00 1105.8±28.61µs ? ?/sec 1.23 1358.9±236.19µs ? ?/sec
physical_plan_tpch_q4 1.00 852.5±13.28µs ? ?/sec 1.12 957.4±161.05µs ? ?/sec
physical_plan_tpch_q5 1.00 1685.6±23.24µs ? ?/sec 1.10 1857.1±21.46µs ? ?/sec
physical_plan_tpch_q6 1.00 642.0±188.39µs ? ?/sec 1.01 647.4±12.04µs ? ?/sec
physical_plan_tpch_q7 1.00 2.2±0.45ms ? ?/sec 1.08 2.4±0.54ms ? ?/sec
physical_plan_tpch_q8 1.00 2.7±0.05ms ? ?/sec 1.08 2.9±0.06ms ? ?/sec
physical_plan_tpch_q9 1.00 1964.3±55.93µs ? ?/sec 1.08 2.1±0.04ms ? ?/sec
I am starting the benchmark run now -- I'll report back here and give this PR a look as soon as possible (but maybe not until tomorrow)
Thank you so much @peter-toth
My results appear consistent with yours @peter-toth -- looks like maybe the high column count case is slightly worse -- I can maybe profile it to see if I can find any reason that might be
++ critcmp main better-cse-identifier
group better-cse-identifier main
----- --------------------- ----
logical_aggregate_with_join 1.01 1023.5±41.82µs ? ?/sec 1.00 1008.6±10.43µs ? ?/sec
logical_plan_tpcds_all 1.00 152.4±1.06ms ? ?/sec 1.00 152.2±1.00ms ? ?/sec
logical_plan_tpch_all 1.00 16.9±0.19ms ? ?/sec 1.01 17.1±0.26ms ? ?/sec
logical_select_all_from_1000 1.02 18.9±0.12ms ? ?/sec 1.00 18.6±1.74ms ? ?/sec
logical_select_one_from_700 1.01 829.8±9.09µs ? ?/sec 1.00 823.9±10.68µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 774.1±10.87µs ? ?/sec 1.00 770.5±27.71µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.00 765.0±32.06µs ? ?/sec 1.00 763.9±9.85µs ? ?/sec
physical_plan_tpcds_all 1.00 1073.0±12.72ms ? ?/sec 1.08 1159.6±5.63ms ? ?/sec
physical_plan_tpch_all 1.00 71.7±0.67ms ? ?/sec 1.10 79.0±0.88ms ? ?/sec
physical_plan_tpch_q1 1.00 2.5±0.04ms ? ?/sec 1.36 3.5±0.02ms ? ?/sec
physical_plan_tpch_q10 1.00 3.6±0.03ms ? ?/sec 1.12 4.0±0.03ms ? ?/sec
physical_plan_tpch_q11 1.00 3.1±0.03ms ? ?/sec 1.09 3.4±0.03ms ? ?/sec
physical_plan_tpch_q12 1.00 2.4±0.01ms ? ?/sec 1.12 2.7±0.02ms ? ?/sec
physical_plan_tpch_q13 1.00 1805.2±95.39µs ? ?/sec 1.11 2.0±0.01ms ? ?/sec
physical_plan_tpch_q14 1.00 2.0±0.03ms ? ?/sec 1.18 2.4±0.02ms ? ?/sec
physical_plan_tpch_q16 1.00 3.0±0.03ms ? ?/sec 1.11 3.3±0.03ms ? ?/sec
physical_plan_tpch_q17 1.00 2.8±0.02ms ? ?/sec 1.09 3.0±0.03ms ? ?/sec
physical_plan_tpch_q18 1.00 3.2±0.03ms ? ?/sec 1.09 3.5±0.02ms ? ?/sec
physical_plan_tpch_q19 1.00 4.9±0.06ms ? ?/sec 1.05 5.2±0.05ms ? ?/sec
physical_plan_tpch_q2 1.00 6.3±0.04ms ? ?/sec 1.05 6.6±0.05ms ? ?/sec
physical_plan_tpch_q20 1.00 3.7±0.04ms ? ?/sec 1.07 3.9±0.04ms ? ?/sec
physical_plan_tpch_q21 1.00 5.1±0.06ms ? ?/sec 1.06 5.4±0.03ms ? ?/sec
physical_plan_tpch_q22 1.00 2.7±0.02ms ? ?/sec 1.10 3.0±0.03ms ? ?/sec
physical_plan_tpch_q3 1.00 2.6±0.03ms ? ?/sec 1.16 2.9±0.02ms ? ?/sec
physical_plan_tpch_q4 1.00 1956.8±10.71µs ? ?/sec 1.09 2.1±0.02ms ? ?/sec
physical_plan_tpch_q5 1.00 3.7±0.04ms ? ?/sec 1.09 4.1±0.02ms ? ?/sec
physical_plan_tpch_q6 1.00 1316.0±9.67µs ? ?/sec 1.10 1452.2±16.94µs ? ?/sec
physical_plan_tpch_q7 1.00 4.7±0.03ms ? ?/sec 1.04 4.8±0.03ms ? ?/sec
physical_plan_tpch_q8 1.00 5.8±0.04ms ? ?/sec 1.07 6.2±0.04ms ? ?/sec
physical_plan_tpch_q9 1.00 4.4±0.03ms ? ?/sec 1.07 4.7±0.03ms ? ?/sec
physical_select_all_from_1000 1.04 46.8±0.32ms ? ?/sec 1.00 45.1±0.20ms ? ?/sec
physical_select_one_from_700 1.03 3.5±0.02ms ? ?/sec 1.00 3.4±0.02ms ? ?/sec
@alamb, can you please help me with that MSRV failure? I don't know what could be the source of the failure and if I try running
cargo msrv verifylocally it fails with:% cargo msrv verify Fetching index Unable to find key 'package.rust-version' (or 'package.metadata.msrv') in '/Users/ptoth/git/apache/arrow-datafusion/Cargo.toml'
I've bumped MSRV to 1.76 in https://github.com/apache/datafusion/pull/10473/commits/ccc92b9ab00813a69583abe2ec4ab5517107be35. It we want to avoid that then let me know and I will try to find out what is not available in 1.75.
I ran the benchmarks again:
Looks to me like this PR makes planning 10% faster
physical_plan_tpcds_all 1.00 1074.8±5.94ms ? ?/sec 1.10 1181.1±6.54ms ? ?/sec
physical_plan_tpch_all 1.00 72.2±0.77ms ? ?/sec 1.11 79.8±0.67ms ? ?/sec
🚀
++ critcmp main better-cse-identifier
group better-cse-identifier main
----- --------------------- ----
logical_aggregate_with_join 1.00 1038.1±44.37µs ? ?/sec 1.00 1040.8±49.37µs ? ?/sec
logical_plan_tpcds_all 1.00 153.0±1.07ms ? ?/sec 1.01 154.3±0.98ms ? ?/sec
logical_plan_tpch_all 1.00 17.1±0.20ms ? ?/sec 1.01 17.2±0.20ms ? ?/sec
logical_select_all_from_1000 1.00 18.9±0.09ms ? ?/sec 1.01 19.0±0.11ms ? ?/sec
logical_select_one_from_700 1.01 840.6±10.15µs ? ?/sec 1.00 836.0±9.66µs ? ?/sec
logical_trivial_join_high_numbered_columns 1.00 789.5±33.87µs ? ?/sec 1.00 791.5±15.53µs ? ?/sec
logical_trivial_join_low_numbered_columns 1.00 771.3±21.54µs ? ?/sec 1.02 785.4±21.89µs ? ?/sec
physical_plan_tpcds_all 1.00 1074.8±5.94ms ? ?/sec 1.10 1181.1±6.54ms ? ?/sec
physical_plan_tpch_all 1.00 72.2±0.77ms ? ?/sec 1.11 79.8±0.67ms ? ?/sec
physical_plan_tpch_q1 1.00 2.6±0.02ms ? ?/sec 1.38 3.5±0.03ms ? ?/sec
physical_plan_tpch_q10 1.00 3.6±0.03ms ? ?/sec 1.14 4.1±0.06ms ? ?/sec
physical_plan_tpch_q11 1.00 3.1±0.03ms ? ?/sec 1.11 3.4±0.04ms ? ?/sec
physical_plan_tpch_q12 1.00 2.4±0.02ms ? ?/sec 1.13 2.8±0.02ms ? ?/sec
physical_plan_tpch_q13 1.00 1814.3±19.22µs ? ?/sec 1.12 2.0±0.02ms ? ?/sec
physical_plan_tpch_q14 1.00 2.0±0.02ms ? ?/sec 1.18 2.4±0.02ms ? ?/sec
physical_plan_tpch_q16 1.00 3.0±0.03ms ? ?/sec 1.12 3.4±0.03ms ? ?/sec
physical_plan_tpch_q17 1.00 2.8±0.03ms ? ?/sec 1.10 3.1±0.03ms ? ?/sec
physical_plan_tpch_q18 1.00 3.3±0.03ms ? ?/sec 1.11 3.6±0.04ms ? ?/sec
physical_plan_tpch_q19 1.00 4.8±0.05ms ? ?/sec 1.07 5.2±0.05ms ? ?/sec
physical_plan_tpch_q2 1.00 6.4±0.15ms ? ?/sec 1.05 6.7±0.05ms ? ?/sec
physical_plan_tpch_q20 1.00 3.6±0.03ms ? ?/sec 1.08 3.9±0.03ms ? ?/sec
physical_plan_tpch_q21 1.00 5.1±0.07ms ? ?/sec 1.08 5.4±0.04ms ? ?/sec
physical_plan_tpch_q22 1.00 2.7±0.03ms ? ?/sec 1.10 3.0±0.02ms ? ?/sec
physical_plan_tpch_q3 1.00 2.6±0.02ms ? ?/sec 1.16 3.0±0.02ms ? ?/sec
physical_plan_tpch_q4 1.00 1974.3±23.01µs ? ?/sec 1.09 2.1±0.03ms ? ?/sec
physical_plan_tpch_q5 1.00 3.7±0.04ms ? ?/sec 1.10 4.1±0.06ms ? ?/sec
physical_plan_tpch_q6 1.00 1321.5±35.02µs ? ?/sec 1.11 1472.4±20.45µs ? ?/sec
physical_plan_tpch_q7 1.00 4.7±0.04ms ? ?/sec 1.05 4.9±0.05ms ? ?/sec
physical_plan_tpch_q8 1.00 5.9±0.08ms ? ?/sec 1.09 6.4±0.06ms ? ?/sec
physical_plan_tpch_q9 1.00 4.4±0.05ms ? ?/sec 1.08 4.8±0.04ms ? ?/sec
physical_select_all_from_1000 1.00 46.7±0.40ms ? ?/sec 1.02 47.7±0.29ms ? ?/sec
physical_select_one_from_700 1.00 3.5±0.02ms ? ?/sec 1.01 3.5±0.03ms ? ?/sec
🚀 thanks again @peter-toth