[PROPOSAL]: JoinIndexRuleV2: Proposal for supporting join optimizations independently on either side of a join node.
Problem Statement
Supporting an alternate version of join rule: JoinIndexRuleV2. This would allow independently using indexes on either side of an eligible Join node.
Update: We decided to upgrade JoinIndexRule (V1) to enhance its capabilities and remove its limitations instead of creating a new rule.
Background and Motivation
With JoinIndexRule in pace, we were able to get an overall improvement on TPC-DS queries by 18%. After analyzing some plans, we identified major areas of improvement which resulted in the ideas mentioned below. Prototyping it resulted in an overall improvement of 44%.
JoinIndexRule Behavior and Limitations:
- works on both sides of the join at the same time. It doesn't work independently on both sides, limiting the possibility of shuffle elimination on just one side wherever possible.
- works only on the lowest level join. for broadcast joins, it doesn't improve performance much.
- doesn't intelligently find a higher level shuffle based joins to optimize.
JoinIndexRuleV2:
- works on either sides of the join independently. It can pick indexes to eliminate shuffle on just one side also.
- optimizes shuffle based joins (e.g. SortMergeJoin), disregards broadcast join.
- can work on higher level joins if one side of the join can be replaced by index and can eliminate shuffle.
E.g.
SMJ
/ \
t1 BHJ
/ \
t2 t3
t1 can be replaced with t1Index to improve the SMJ. t2 can also be replaced with t2Index to improve the SMJ by propagating it's partitioning information up the BHJ.
Proposed Solution
Original Algorithm:
- Identify whether this join node can be optimized:
- We support only equi-joins in CNF forms. Also make sure the join columns are directly picked from scan nodes.
- This join is not a broadcast hash join. To check this, we independently check the left
and right sides of the join and make sure their size is less than
"spark.sql.autoBroadcastJoinThreshold"
- Independently check left and right sides of the join for available indexes. If an index is picked, the shuffle on that side could be eliminated.
Updates Algorithm
General Ideas
This approach upgrades JoinIndexRule (V1) with the desired capabilities of JoinIndexRuleV2 (V2). This approach also targets to overcome the challenges mentioned in the section: Potential Issues. Here are the suggested changes:
- Remove BHJ from V1 (use same logic from V2)
- Collect eligible indexes from left/right side. Find compatible indexes which work on both sides. Then replace. If no compatible indexes found, replace the size-wise bigger side with index.
- Remove linearity check requirement from (V1).
- Add output partitioning checks on either side of join node. Find eligible indexes only if output partitioning is same or None on the path from join node to a leaf node.
Potential Problem with this approach: After finalizing leftIndex and rightIndex, how to identify which left logical relation and right logical relation to replace with the index.
Eligibility of the join Condition
We still only support Equi-Join conditions in CNF form (A==B And C==D and E == F)
Eligibility of an Index
If an index leftIndex satisfies left side of the join node, and replace a relation leftRelation1, here are it's eligibility requirements (Same for rightIndex and right:
- All sub-conditions in the join
conditionmust contain one column each fromleftRelation. These columns are also the index columns ofleftIndex - Consider the tree path starting from the join node and ending at the leaf node
leftRelation1. TheoutputPartitioningof every node on this path should be either null/None or it should match the output partitioning requirement ofleft.
Compatibility of left and right indexes
If both left and right side are able to find eligible indexes, we need to find compatible indexes on these sides. Compatible indexes would be the ones where corresponding columns of left and right side (based on join condition) are in the same order in the indexed columns of chosen indexes.
Handling lack of compatibility or no-indexes
If leftIndex exists and rightIndex is null, we will directly apply leftIndex to leftRelation1. Similarly if leftIndex is null and rightIndex exists.
If both leftIndex and rightIndex exist but are not compatible, we will use stats (e.g. sizeInBytes) for leftRelation1 and rightRelation1 and decide which side gets optimized. We apply index only on that side.
Algorithm
- Validate eligibility of join condition: equi-join in CNF
- Find eligible indexes on
leftside andrightside based on eligibility criteria mentioned above - Find compatible indexes from left-right sides if both sides are non-empty.
- If compatible index pairs are found, replace the relevant
logicalRelationwith index - If compatible index pairs are not found, or one of the sides can't use an index, choose which side will be more benefitted by using the index. We can use stats (sizeInBytes) and pick the side with higher value if there's a conflict. Replace with index.
Known/Potential Compatibility Issues
Updates to the algorithm should address the following issues except issue number 4 (hybrid scan). We will not be targeting issue 4 because it's not that expensive if hybrid scan adds small amount of data.
1. There could be cases where this rule runs and selects indexes but doesn't improve the query performance. For e.g.
SMJ(t1.c = t3.c)
/ \
t1 BHJ
/ \
t2(streamed) t3(broadcasted)
In this example, replacing t3 with t3Index for the SMJ will not help much
2. Multi-column Joins
JoinRule V2 independently handles both left and right sides of the join. For multi-column join it cannot guarantee if both side indexes will be compatible
3. Ineligible SMJs in between, and hybrid scan is disabled. No improvement, no regression:
SMJ1
/ \ (shuffle for SMJ1)
t1 SMJ2 (not eligible for index)
/ \
shuffle shuffle
/ \
t2 t3
will be transformed:
SMJ1 (eligible for index, higher level SMJ)
/ \ (shuffle for SMJ1)
t1 SMJ2 (not eligible for index, lower level SMJ, different bucketing columns)
/ \
shuffle shuffle
/ \
t2 idx t3
t2 idx doesn't help as it anyway gets shuffled.
4. Ineligible SMJs in between, and hybrid scan is enabled. Could cause regression:
SMJ1
/ \ (shuffle for SMJ1)
t1 SMJ2 (not eligible for index)
/ \
shuffle shuffle
/ \
t2 t3
will be transformed:
SMJ1 (eligible for index, higher level SMJ)
/ \ (shuffle for SMJ1)
t1 SMJ2 (not eligible for index, lower level SMJ, different bucketing columns)
/ \
shuffle for SMJ2 shuffle - t3
/
bucket union for SMJ1
/ \
t2 idx shuffle for SMJ1
\
t2 appended
Implementation
https://github.com/microsoft/hyperspace/pull/124
Performance Implications (if applicable)
This could improve performance of joins which were previously not improved because of limitations of JoinRuleV1
Open issues (if applicable)
Additional context (if applicable) This was used as it is in this PR while showing the perf benefits in spark summit in June 2020. https://github.com/microsoft/hyperspace/pull/124
cc @imback82 @thugsatbay @AFFogarty @sezruby please have a look at the updates incorporated after discussion
Looks good to me! Could you update the PR accordingly?
Btw, if you get a chance, could you check if BHJ removal condition works for upper layer joins ? (one child is an output of another join..?) Please refer this plan: https://github.com/microsoft/hyperspace/pull/124#discussion_r572462083