spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&&b==null) to a<=>b

Open ayushi-agarwal opened this issue 2 years ago • 3 comments

What changes were proposed in this pull request?

New case is added in Boolean simplification to convert condition of form (a==b) || (a==null&&b==null) to a<=>b.

Why are the changes needed?

If the join condition is like key1==key2 || (key1==null && key2==null), join is executed as Broadcast Nested Loop Join as this condition doesn't satisfy equi join condition. BNLJ takes more time as compared to Sort merge or broadcast hash join. This condition can be converted to key1<=>key2 to make the join execute as Broadcast or sort merge join. It will improve the performance of queries which have join with condition which matches this pattern.

Sample query: val dfAns = df.join(df1, (df("v")===df1("x") or (isnull(df("v")) and isnull(df1("x")))), "leftanti")

Plan before change OptimizedPlan: Join LeftAnti, ((v#1 = x#15) || (isnull(v#1) && isnull(x#15))) :- LocalRelation [g#0, v#1, o#2, x#3] +- LocalRelation [x#15]

dfAns.queryExecution.executedPlan *(1) BroadcastNestedLoopJoin BuildRight, LeftAnti, ((v#256 = x#270) || (isnull(v#256) && isnull(x#270))) :- LocalTableScan [g#255, v#256, o#257, x#258] +- BroadcastExchange IdentityBroadcastMode, [id=#91] +- LocalTableScan [x#270]

Plan after change OptimizedPlan Join LeftAnti, (v#29 <=> x#79) :- LocalRelation [g#28, v#29, o#30, x#31] +- LocalRelation [x#79]

ExecutedPlan *(1) BroadcastHashJoin [coalesce(v#29, 0), isnull(v#29)], [coalesce(x#71, 0), isnull(x#71)], LeftAnti, BuildRight :- LocalTableScan [g#28, v#29, o#30, x#31] +- BroadcastExchange HashedRelationBroadcastMode(ArrayBuffer(coalesce(input[0, int, false], 0), isnull(input[0, int, false]))), [id=#57] +- LocalTableScan [x#71]

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests run

ayushi-agarwal avatar Aug 23 '22 09:08 ayushi-agarwal

Can one of the admins verify this patch?

AmplabJenkins avatar Aug 23 '22 15:08 AmplabJenkins

gently ping @cloud-fan @srowen Can you please help to verify this patch?

ayushi-agarwal avatar Aug 24 '22 07:08 ayushi-agarwal

cc @sigmod @wangyum

cloud-fan avatar Aug 24 '22 09:08 cloud-fan

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Dec 27 '22 00:12 github-actions[bot]