HIVE-28480: Disable SMB on partition hash generator mismatch across join branches…
… in previous RS
What changes were proposed in this pull request?
As SMB replaces last RS op from the joining branches and the JOIN op with MERGEJOIN, we need to ensure the RS before these RS, in both branches, are partitioning using same hash generator.
Hash code generator differs based on ReducerTraits.UNIFORM i.e. ReduceSinkOperator#computeMurmurHash() or ReduceSinkOperator#computeHashCode(), leading to different hash code for same value.
Skip SMB join in such cases.
Why are the changes needed?
It fixes bug that results in incorrect join result. Example query and behavior explanation is added in https://issues.apache.org/jira/browse/HIVE-28480
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Tested via auto_sortmerge_join_18.q added in this PR
The patch seems reasonable. LGTM
@okumin if this looks good, please merge the PR
@himanshu-mishra Sorry, I don't have the right to merge a PR as I am not a committer. You need approval of any committer who is ideally an expert of SMB join.
@kasakrisz can you please review or add reviewer
This is a good candidate to fix this issue, but have you checked why the reducer traits are different?
Consider the sample query
SELECT * FROM (
SELECT k, COUNT(DISTINCT v), SUM(v)
FROM t_asj_18 GROUP BY k
) a LEFT JOIN (
SELECT k, COUNT(v)
FROM t_asj_18 GROUP BY k
) b ON a.k = b.k;
In this query the aggregation k, count(distinct v).. group by k leads to group by and hence following RS operator's keys to be k, v while keeping partition column k, as we are aggregating on k. In such cases where key columns and partition columns differ, the UNIFORM trait is removed refer commit.
Note that the group by keys and partition columns differ only with distinct aggregate, removing that would lead to both being a thereby having UNIFORM trait.
@kasakrisz let me know if there are pending concerns.
@kasakrisz let me know if there are pending concerns.
Sorry, I was occupied in the last few days. I left only one comment.
Quality Gate passed
Issues
6 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@kasakrisz as you have approved the changes, can you help with merging the PR.