pinot icon indicating copy to clipboard operation
pinot copied to clipboard

[multistage] Adding Stage node support for UncollectNode

Open xiangfu0 opened this issue 2 years ago • 8 comments

For unnest query support in v2:

SELECT a.col1, col22 FROM a CROSS JOIN UNNEST(a.col2) AS t(col22)
  • [X] Sql -> SqlNodeAndOption
  • [X] SqlNodeAndOption -> StageNode (Query Plan)

Next:

  1. Optimize StageNodes CorrelateNode + UncollectNode to UnnestNode for execution
  2. Plan execution.

xiangfu0 avatar Mar 28 '23 00:03 xiangfu0

Codecov Report

Merging #10488 (4eecc11) into master (2d6a38c) will increase coverage by 0.02%. The diff coverage is 61.95%.

@@             Coverage Diff              @@
##             master   #10488      +/-   ##
============================================
+ Coverage     70.33%   70.35%   +0.02%     
+ Complexity     6499     6493       -6     
============================================
  Files          2108     2110       +2     
  Lines        113535   113625      +90     
  Branches      17117    17122       +5     
============================================
+ Hits          79859    79946      +87     
+ Misses        28083    28078       -5     
- Partials       5593     5601       +8     
Flag Coverage Δ
integration1 24.33% <0.00%> (-0.17%) :arrow_down:
integration2 24.12% <0.00%> (-0.01%) :arrow_down:
unittests1 67.88% <61.95%> (-0.02%) :arrow_down:
unittests2 13.87% <0.00%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...java/org/apache/pinot/common/utils/DataSchema.java 79.41% <ø> (ø)
...e/pinot/query/planner/ExplainPlanStageVisitor.java 76.74% <0.00%> (-1.83%) :arrow_down:
...t/query/planner/logical/ShuffleRewriteVisitor.java 0.00% <0.00%> (ø)
...hysical/colocated/GreedyShuffleRewriteVisitor.java 0.00% <0.00%> (ø)
...lanner/stage/DefaultPostOrderTraversalVisitor.java 0.00% <0.00%> (ø)
.../pinot/query/runtime/plan/PhysicalPlanVisitor.java 89.79% <0.00%> (-3.83%) :arrow_down:
...t/query/runtime/plan/ServerRequestPlanVisitor.java 84.87% <0.00%> (-1.46%) :arrow_down:
...pache/pinot/query/planner/stage/CorrelateNode.java 66.66% <66.66%> (ø)
...che/pinot/query/planner/logical/RexExpression.java 87.37% <75.00%> (-6.47%) :arrow_down:
...pache/pinot/query/planner/stage/UncollectNode.java 83.33% <83.33%> (ø)
... and 3 more

... and 34 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Mar 28 '23 09:03 codecov-commenter

@xiangfu0 Would CROSS JOIN UNNEST between correlated LHS and RHS (UNNEST basically operates on a field that belongs to left join relation) results only in the cross product of the current row of the left table with all the rows from the UNNEST output?

nizarhejazi avatar Mar 29 '23 09:03 nizarhejazi

@xiangfu0 Would CROSS JOIN UNNEST between correlated LHS and RHS (UNNEST basically operates on a field that belongs to left join relation) results only in the cross product of the current row of the left table with all the rows from the UNNEST output?

that's the definition when doing

... FROM tbl CROSS JOIN UNNEST(tbl.arrCol)

see LATERAL & UNNEST in https://www.postgresql.org/docs/current/queries-table-expressions.html

walterddr avatar Mar 29 '23 14:03 walterddr

this PR can be closed if we have #10685 yes?

walterddr avatar Apr 25 '23 15:04 walterddr

this PR can be closed if we have #10685 yes?

I think UncollectNode is still need, CorrelateNode is not needed.

I will rebase once #10685 is merged.

xiangfu0 avatar Apr 27 '23 02:04 xiangfu0

this PR can be closed if we have #10685 yes?

I think UncollectNode is still need, CorrelateNode is not needed.

I will rebase once #10685 is merged.

Decorrelate won't work on the correlate + uncollect node. I think we should handle this in a rule and write our own StageNode for translation.

xiangfu0 avatar Apr 27 '23 23:04 xiangfu0

Decorrelate won't work on the correlate + uncollect node. I think we should handle this in a rule and write our own StageNode for translation.

hmm. this is probably expected as the unCollect is more like a UDTF (user-defined table function) that expands 1-row into n-rows. modeling it as a self-join will not simplify the plan

walterddr avatar Apr 28 '23 03:04 walterddr

For Unnest, we need to have a separated operator to work on it, it's not able to decorrelate.

xiangfu0 avatar Jun 25 '24 05:06 xiangfu0