datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

fix: modulo op with negative zero divisor produces Nan

Open vaibhawvipul opened this issue 1 year ago • 11 comments

Which issue does this PR close?

Closes #521 .

Rationale for this change

What changes are included in this PR?

Improves compatibility with apache spark.

How are these changes tested?

Added a relevant test case.

vaibhawvipul avatar Jun 18 '24 12:06 vaibhawvipul

Filed https://github.com/apache/datafusion/issues/11051 lets see

comphead avatar Jun 21 '24 17:06 comphead

imho it has to be fixed on DataFusion side. I can see it returns NaN

> select 1 % -0.0;
+------------------------+
| Int64(1) % Float64(-0) |
+------------------------+
| NaN                    |
+------------------------+

PG, Trino fails on such query, DuckDB, Spark returns NULL

I'll start a ticket in DF, if community supports to go with NULL, so no action needed for this ticket, if they decide to fail the query we might need some custom handling

Make sense. I will keep my PR as it then until we get some resolution on the datafusion ticket.

Thank you @comphead

vaibhawvipul avatar Jun 21 '24 17:06 vaibhawvipul

The direct reason why NaN is produced by dividing by -0.0 is that EqualTo condition is failing in nullIfWhenPrimitive() of QueryPlanSerde

DataFusion changed their behavior several versions ago, to fail for dividing by zero. The DataFusion community recommendation to keep old behavior was to put if condition such as nullIfWhenPrimitive().

I would say we should do something like

if (expression.dataType == DoubleType) {
    If(Or(EqualTo(expression, zero), EqualTo(expression, negZero)), Literal.create(null, expression.dataType), expression)
} else {
    If(EqualTo(expression, zero), Literal.create(null, expression.dataType), expression)
}

If is also supported natively, so it won't be too much overhead (at least that was the case last time I tried). I would hesitate to re-implement whole operator only for negative zero case. I feel we should reuse DataFusion as much as possible

kazuyukitanimura avatar Jun 21 '24 19:06 kazuyukitanimura

The direct reason why NaN is produced by dividing by -0.0 is that EqualTo condition is failing in nullIfWhenPrimitive() of QueryPlanSerde

DataFusion changed their behavior several versions ago, to fail for dividing by zero. The DataFusion community recommendation to keep old behavior was to put if condition such as nullIfWhenPrimitive().

I would say we should do something like

if (expression.dataType == DoubleType) {
    If(Or(EqualTo(expression, zero), EqualTo(expression, negZero)), Literal.create(null, expression.dataType), expression)
} else {
    If(EqualTo(expression, zero), Literal.create(null, expression.dataType), expression)
}

If is also supported natively, so it won't be too much overhead (at least that was the case last time I tried). I would hesitate to re-implement whole operator only for negative zero case. I feel we should reuse DataFusion as much as possible

makes a lot of sense. I have made a fix as per your comments.

please review @kazuyukitanimura

vaibhawvipul avatar Jun 22 '24 02:06 vaibhawvipul

can someone please re-run the CI? It seems like a task failed because of network error.

cc @kazuyukitanimura @comphead

vaibhawvipul avatar Jun 23 '24 15:06 vaibhawvipul

@kazuyukitanimura ready for review

vaibhawvipul avatar Jun 27 '24 20:06 vaibhawvipul

@kazuyukitanimura ready for review.

vaibhawvipul avatar Jun 28 '24 05:06 vaibhawvipul

Codecov Report

Attention: Patch coverage is 52.08333% with 23 lines in your changes missing coverage. Please review.

Project coverage is 34.08%. Comparing base (2fc45a2) to head (6e843da). Report is 20 commits behind head on main.

Files Patch % Lines
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 52.08% 9 Missing and 14 partials :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #585      +/-   ##
============================================
- Coverage     34.13%   34.08%   -0.06%     
+ Complexity      809      796      -13     
============================================
  Files           106      108       +2     
  Lines         38586    38767     +181     
  Branches       8566     8581      +15     
============================================
+ Hits          13172    13212      +40     
- Misses        22674    22799     +125     
- Partials       2740     2756      +16     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jun 30 '24 21:06 codecov-commenter

@kazuyukitanimura could you please review once? CI passed.

vaibhawvipul avatar Jul 02 '24 03:07 vaibhawvipul

@vaibhawvipul Sorry, I got busy with something else. I may be unable to review or work on OSS a week or two.

kazuyukitanimura avatar Jul 02 '24 17:07 kazuyukitanimura

@andygrove / @kazuyukitanimura can we please get CI triggered? and also a review?

vaibhawvipul avatar Jul 26 '24 09:07 vaibhawvipul

The main issue was fixed by https://github.com/apache/datafusion-comet/pull/953 The test cases from this PR would be still helpful

kazuyukitanimura avatar Sep 23 '24 16:09 kazuyukitanimura

Please never mind my comment above, I posted a workaroud PR #960

kazuyukitanimura avatar Sep 23 '24 23:09 kazuyukitanimura

@kazuyukitanimura can we close it?

comphead avatar Sep 24 '24 21:09 comphead

Yes @comphead @vaibhawvipul

kazuyukitanimura avatar Sep 24 '24 21:09 kazuyukitanimura