doris icon indicating copy to clipboard operation
doris copied to clipboard

[Feature](nereids) Support Mark join

Open zhengshiJ opened this issue 2 years ago • 60 comments

Proposed changes

Issue Number: close #16572 , close #14314

Problem summary

1.The new optimizer supports the combination of subquery and disjunction.In the way of MarkJoin, it behaves the same as the old optimizer. For design details see:https://emmymiao87.github.io/jekyll/update/2021/07/25/Mark-Join.html. 2.Implicit type conversion is performed when conjects are generated after subquery parsing 3.Convert the unnesting of scalarSubquery in filter from filter+join to join + Conjuncts. eg: unCorrelator EXPLAIN logical plan SELECT * FROM sub_query_correlated_subquery1 WHERE k1 > (SELECT AVG(k1) FROM sub_query_correlated_subquery3); before

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                                                                                          |
| +--LogicalFilter ( predicates=(cast(k1#0 as DOUBLE) > AVG(k1)#7))                                                                                                                                                                                |
| +--LogicalJoin ( type=CROSS_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[] )                                                                                                                            |
|    |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON )                            |
|    +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) )                                                                                                                                    |
|       +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false )                                                                                                                                             |
|          +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true )                                                                                                                                                                    |
|             +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

after

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                                                                                          |
| +--LogicalJoin ( type=CROSS_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[(cast(k1#0 as DOUBLE) > AVG(k1)#7)] )                                                                                          |
|    |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON )                            |
|    +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) )                                                                                                                                    |
|       +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false )                                                                                                                                             |
|          +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true )                                                                                                                                                                    |
|             +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Correlator EXPLAIN logical plan SELECT * FROM sub_query_correlated_subquery1 WHERE k1 > (SELECT AVG(k1) FROM sub_query_correlated_subquery3 and sub_query_correlated_subquery1.k1 = sub_query_correlated_subquery3.k1); before

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                                                                                          |
| +--LogicalFilter ( predicates=(cast(k1#0 as DOUBLE) > AVG(k1)#7))                                                                                                                                                                                |
| +--LogicalJoin ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[] )                                                                                                                        |
|    |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON )                            |
|    +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) )                                                                                                                                    |
|       +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false )                                                                                                                                             |
|          +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true )                                                                                                                                                                    |
|             +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

after

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                                                                                          |
| +--LogicalJoin ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[(cast(k1#0 as DOUBLE) > AVG(k1)#7)] )                                                                                      |
|    |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON )                            |
|    +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) )                                                                                                                                    |
|       +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false )                                                                                                                                             |
|          +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true )                                                                                                                                                                    |
|             +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Checklist(Required)

  1. Does it affect the original behavior:
    • [ ] Yes
    • [x] No
    • [ ] I don't know
  2. Has unit tests been added:
    • [x] Yes
    • [ ] No
    • [ ] No Need
  3. Has document been added or modified:
    • [ ] Yes
    • [ ] No
    • [x] No Need
  4. Does it need to update dependencies:
    • [ ] Yes
    • [x] No
  5. Are there any changes that cannot be rolled back:
    • [ ] Yes (If Yes, please explain WHY)
    • [x] No

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

zhengshiJ avatar Feb 10 '23 09:02 zhengshiJ

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 16 '23 05:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 21 '23 07:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 21 '23 12:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 22 '23 01:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 22 '23 01:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 22 '23 07:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 22 '23 07:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 22 '23 09:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 22 '23 11:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 22 '23 11:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 01:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 01:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 01:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 02:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 02:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 03:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 03:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 05:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 06:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 07:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 23 '23 07:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 02:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 03:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 03:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 04:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 05:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 05:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 06:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 06:02 github-actions[bot]

clang-tidy review says "All clean, LGTM! :+1:"

github-actions[bot] avatar Feb 24 '23 06:02 github-actions[bot]