hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-28488: Merge multiple adjacent union distinct into single adjacent union distinct

Open ngsg opened this issue 1 year ago • 1 comments

What changes were proposed in this pull request?

This PR proposed a new optimization to reduce the shuffle when computing union distinct of multiple tables. The new optimization merges GroupBy operators computing distinct after Union, thus reducing the number of edges involved by UNION DISTINCT. A new configuration key, hive.optimize.merge.adjacent.union.distinct, is introduced to configure this optimization.

Please check out the attached slides in the JIRA page (HIVE-28488) for further explanations.

Why are the changes needed?

To improve execution time of query containing many UNION DISTINCT.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

The proposed optimization is tested by running TPC-DS query 49 and 75. This PR contains a qfile test to confirm that the patch optimizes query plan.

ngsg avatar Aug 30 '24 04:08 ngsg

@deniskuzZ , I checked sonar and resolved 3 out of 6 reported issues. The remaining issues are not resolved because of the following reasons:

https://github.com/apache/hive/blob/612afeb8d7268ebc46cacd44651795bb7e5f999e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/UnionDistinctMerger.java#L73 This method uses return statement just for quick exit. Since its returned values is not used, returning the same value(null) is not harmful.

https://github.com/apache/hive/blob/612afeb8d7268ebc46cacd44651795bb7e5f999e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/UnionDistinctMerger.java#L79 The type of context.pCtx.getAllOps() is raw Collection<Operator>, so the Set uses it, too.

https://github.com/apache/hive/blob/612afeb8d7268ebc46cacd44651795bb7e5f999e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/UnionDistinctMerger.java#L136 Sonar determines this comment as the block of commented-out code, which is not true.

ngsg avatar Oct 31 '24 13:10 ngsg