doris
doris copied to clipboard
[Feature](nereids) Support Mark join
Proposed changes
Issue Number: close #16572 , close #14314
Problem summary
1.The new optimizer supports the combination of subquery and disjunction.In the way of MarkJoin, it behaves the same as the old optimizer. For design details see:https://emmymiao87.github.io/jekyll/update/2021/07/25/Mark-Join.html.
2.Implicit type conversion is performed when conjects are generated after subquery parsing
3.Convert the unnesting of scalarSubquery in filter from filter+join to join + Conjuncts.
eg:
unCorrelator
EXPLAIN logical plan SELECT * FROM sub_query_correlated_subquery1 WHERE k1 > (SELECT AVG(k1) FROM sub_query_correlated_subquery3);
before
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true ) |
| +--LogicalFilter ( predicates=(cast(k1#0 as DOUBLE) > AVG(k1)#7)) |
| +--LogicalJoin ( type=CROSS_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[] ) |
| |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON ) |
| +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) ) |
| +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false ) |
| +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true ) |
| +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
after
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true ) |
| +--LogicalJoin ( type=CROSS_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[(cast(k1#0 as DOUBLE) > AVG(k1)#7)] ) |
| |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON ) |
| +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) ) |
| +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false ) |
| +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true ) |
| +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Correlator
EXPLAIN logical plan SELECT * FROM sub_query_correlated_subquery1 WHERE k1 > (SELECT AVG(k1) FROM sub_query_correlated_subquery3 and sub_query_correlated_subquery1.k1 = sub_query_correlated_subquery3.k1);
before
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true ) |
| +--LogicalFilter ( predicates=(cast(k1#0 as DOUBLE) > AVG(k1)#7)) |
| +--LogicalJoin ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[] ) |
| |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON ) |
| +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) ) |
| +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false ) |
| +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true ) |
| +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
after
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject ( projects=[k1#0, k2#1], excepts=[], canEliminate=true ) |
| +--LogicalJoin ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[], otherJoinConjuncts=[(cast(k1#0 as DOUBLE) > AVG(k1)#7)] ) |
| |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, output=[k1#0, k2#1], indexName=sub_query_correlated_subquery1, selectedIndexId=29014, preAgg=ON ) |
| +--LogicalAssertNumRows ( assertNumRowsElement=AssertNumRowsElement ( desiredNumOfRows=1, assertion=EQ ) ) |
| +--LogicalAggregate ( groupByExpr=[], outputExpr=[avg(k1#2) AS `AVG(k1)`#7], hasRepeat=false ) |
| +--LogicalProject ( projects=[k1#2], excepts=[], canEliminate=true ) |
| +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, output=[k1#2, k2#3, k3#4, v1#5, v2#6], indexName=sub_query_correlated_subquery3, selectedIndexId=29024, preAgg=ON ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Checklist(Required)
- Does it affect the original behavior:
- [ ] Yes
- [x] No
- [ ] I don't know
- Has unit tests been added:
- [x] Yes
- [ ] No
- [ ] No Need
- Has document been added or modified:
- [ ] Yes
- [ ] No
- [x] No Need
- Does it need to update dependencies:
- [ ] Yes
- [x] No
- Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [x] No
Further comments
If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"
clang-tidy review says "All clean, LGTM! :+1:"