calcite icon indicating copy to clipboard operation
calcite copied to clipboard

[CALCITE-6435] SqlToRel conversion of IN expressions may lead to incorrect simplifications

Open kgyrtkirk opened this issue 1 year ago • 5 comments

Conversion path for comparisions generated from IN expressions was handling types differently. This may have lead to some over-simplification in some cases.

Altered the conversion to do the full SqlToRex conversion steps for these generated nodes as well. Added an extra safeguard check to RexSimplify to prevent the bug from being triggered.

kgyrtkirk avatar Jun 27 '24 15:06 kgyrtkirk

Let's wait a bit to see whether anyone else has comments.

mihaibudiu avatar Jun 28 '24 17:06 mihaibudiu

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

YiwenWu avatar Jun 29 '24 07:06 YiwenWu

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

I don't see any such method in RexLiteral did I missed it?

I would be kinda against of the introduction of such method because it would place custom type comparision logic inside a Rex implementation...

I might have not mentioned this here...but weakening these in RexSimplify has no adverse effects because predicate based simplification can do this too(and more) ; I think these map based things can even be removed....

kgyrtkirk avatar Jun 29 '24 09:06 kgyrtkirk

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

I don't see any such method in RexLiteral did I missed it?

I would be kinda against of the introduction of such method because it would place custom type comparision logic inside a Rex implementation...

I might have not mentioned this here...but weakening these in RexSimplify has no adverse effects because predicate based simplification can do this too(and more) ; I think these map based things can even be removed....

I think it can be simpler to modify the judgment of RexSimplify#simplifyAnd2ForUnknownAsFalse?

if (prevLiteral != null && !literal.equalsWithTypeName(prevLiteral)) {
     return rexBuilder.makeLiteral(false);
 }

where RexLiteral#equalsWithTypeName compares to TypeName instead of Type in RexLiteral#equals . In this case, varchar Sebastian and char Sebastian can be equivalent.

I don't see any such method in RexLiteral did I missed it?

I would be kinda against of the introduction of such method because it would place custom type comparision logic inside a Rex implementation...

I might have not mentioned this here...but weakening these in RexSimplify has no adverse effects because predicate based simplification can do this too(and more) ; I think these map based things can even be removed....

Currently, there is no RexLiteral#equalsWithTypeName method.

It seems two distinct approaches. This PR handles CHAR to VARCHAR conversion early in SqlToRelConverter , enabling direct conjunction in the simplification stage. Alternatively, refine the simplifyAnd2ForUnknownAsFalse method to bypass false judgments for RexLiteral with identical value and typeName(SqlTypeName).

There is no problem based on the approach of this PR. Only two small points that need to be confirmed:

  1. Addressing range concerns, the CHAR to VARCHAR conversion is specific to IN conditions?
  2. Is an update to simplifyAnd2ForUnknownAsFalse necessary, what scenarios would benefit from this change?

YiwenWu avatar Jun 30 '24 09:06 YiwenWu

  1. Addressing range concerns, the CHAR to VARCHAR conversion is specific to IN conditions?

the issue arised from the fact that SqlToRel had a different route to produce = checks when IN was rewritten to a set of OR conditions; this PR only makes that extra route go away - and shows the converter the full = expression to do the same conversion as it would have done in case it would have been part of the original query

  1. Is an update to simplifyAnd2ForUnknownAsFalse necessary, what scenarios would benefit from this change?

not mandatory for the current bug ; as 1 solves that ; but I would rather restrict the cases in which RexSimplify could over-simplify things; as its a quite frequently called part.

kgyrtkirk avatar Jul 06 '24 09:07 kgyrtkirk