spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39841][SQL] simplify conflict binary comparison

Open yikf opened this issue 3 years ago • 8 comments

What changes were proposed in this pull request?

Currently, We do not handle conflicting binary comparison expressions, Let's say have a query, it filter a < 0 and a > 1, this condition should be a FalseLiteral, we can fold ahead of time to avoid unnecessary calculations.

Have a idea, we can capture a and conjunction and both it's left and right are BinaryComparisonExpression, we optimize this predicate as following step:

  • Normal and left and right comparison expression, to keep Attribute is left side and literal is right side.
  • If both comparison expression left child is semantic equal and right child is foldable, we compute a range for column predicate comparison .
  • Then update this column comparison predicate by this range.

Before this pr: image

After this pr: image

Why are the changes needed?

Fold ahead to avoid unnecessary calculations & improve performance

Does this PR introduce any user-facing change?

No, improve only

How was this patch tested?

Add new tests

yikf avatar Jul 22 '22 12:07 yikf

Can one of the admins verify this patch?

AmplabJenkins avatar Jul 22 '22 14:07 AmplabJenkins

@cloud-fan could you please take a look when you have a time, thanks

yikf avatar Jul 22 '22 14:07 yikf

can you briefly explain your idea? Do you keep a range for each column and update the range when seeing a comparison? Then use the range to update the column comparison predicate?

cloud-fan avatar Jul 22 '22 15:07 cloud-fan

can you briefly explain your idea? Do you keep a range for each column and update the range when seeing a comparison? Then use the range to update the column comparison predicate?

Yea, if capture a and conjunction and both it's left and right are BinaryComparisonExpression, we optimize this predicate as following step:

  • Normal and left and right comparison expression, to keep Attribute is left side and literal is right side.
  • If both comparison expression left child is semantic equal and right child is foldable, we compute a range for comparison left side.
  • Then update this column comparison predicate by this range.

yikf avatar Jul 22 '22 16:07 yikf

cc @gengliangwang

cloud-fan avatar Jul 25 '22 10:07 cloud-fan

friendly ping @gengliangwang @cloud-fan , could you please take a look when you have a time?

yikf avatar Jul 29 '22 13:07 yikf

cc @sigmod as well

gengliangwang avatar Aug 01 '22 20:08 gengliangwang

friendly ping @gengliangwang @cloud-fan @sigmod, Sorry for late reply, i was busy last week, i fixed comments you left, please take a look again when you have a time~

yikf avatar Aug 05 '22 13:08 yikf

@Yikf Is there any mainstream database that supports this optimization?

wangyum avatar Aug 30 '22 11:08 wangyum

@Yikf Is there any mainstream database that supports this optimization?

I found a similar optimization in Trino; as follow:

image

yikf avatar Aug 30 '22 11:08 yikf