datafusion-comet
datafusion-comet copied to clipboard
fix: modulo op with negative zero divisor produces Nan
Which issue does this PR close?
Closes #521 .
Rationale for this change
What changes are included in this PR?
Improves compatibility with apache spark.
How are these changes tested?
Added a relevant test case.
Filed https://github.com/apache/datafusion/issues/11051 lets see
imho it has to be fixed on DataFusion side. I can see it returns NaN
> select 1 % -0.0; +------------------------+ | Int64(1) % Float64(-0) | +------------------------+ | NaN | +------------------------+PG, Trino fails on such query, DuckDB, Spark returns NULL
I'll start a ticket in DF, if community supports to go with NULL, so no action needed for this ticket, if they decide to fail the query we might need some custom handling
Make sense. I will keep my PR as it then until we get some resolution on the datafusion ticket.
Thank you @comphead
The direct reason why NaN is produced by dividing by -0.0 is that EqualTo condition is failing in nullIfWhenPrimitive() of QueryPlanSerde
DataFusion changed their behavior several versions ago, to fail for dividing by zero. The DataFusion community recommendation to keep old behavior was to put if condition such as nullIfWhenPrimitive().
I would say we should do something like
if (expression.dataType == DoubleType) {
If(Or(EqualTo(expression, zero), EqualTo(expression, negZero)), Literal.create(null, expression.dataType), expression)
} else {
If(EqualTo(expression, zero), Literal.create(null, expression.dataType), expression)
}
If is also supported natively, so it won't be too much overhead (at least that was the case last time I tried). I would hesitate to re-implement whole operator only for negative zero case. I feel we should reuse DataFusion as much as possible
The direct reason why
NaNis produced by dividing by-0.0is thatEqualTocondition is failing innullIfWhenPrimitive()ofQueryPlanSerdeDataFusion changed their behavior several versions ago, to fail for dividing by zero. The DataFusion community recommendation to keep old behavior was to put if condition such as
nullIfWhenPrimitive().I would say we should do something like
if (expression.dataType == DoubleType) { If(Or(EqualTo(expression, zero), EqualTo(expression, negZero)), Literal.create(null, expression.dataType), expression) } else { If(EqualTo(expression, zero), Literal.create(null, expression.dataType), expression) }
Ifis also supported natively, so it won't be too much overhead (at least that was the case last time I tried). I would hesitate to re-implement whole operator only for negative zero case. I feel we should reuse DataFusion as much as possible
makes a lot of sense. I have made a fix as per your comments.
please review @kazuyukitanimura
can someone please re-run the CI? It seems like a task failed because of network error.
cc @kazuyukitanimura @comphead
@kazuyukitanimura ready for review
@kazuyukitanimura ready for review.
Codecov Report
Attention: Patch coverage is 52.08333% with 23 lines in your changes missing coverage. Please review.
Project coverage is 34.08%. Comparing base (
2fc45a2) to head (6e843da). Report is 20 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| .../scala/org/apache/comet/serde/QueryPlanSerde.scala | 52.08% | 9 Missing and 14 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #585 +/- ##
============================================
- Coverage 34.13% 34.08% -0.06%
+ Complexity 809 796 -13
============================================
Files 106 108 +2
Lines 38586 38767 +181
Branches 8566 8581 +15
============================================
+ Hits 13172 13212 +40
- Misses 22674 22799 +125
- Partials 2740 2756 +16
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@kazuyukitanimura could you please review once? CI passed.
@vaibhawvipul Sorry, I got busy with something else. I may be unable to review or work on OSS a week or two.
@andygrove / @kazuyukitanimura can we please get CI triggered? and also a review?
The main issue was fixed by https://github.com/apache/datafusion-comet/pull/953 The test cases from this PR would be still helpful
Please never mind my comment above, I posted a workaroud PR #960
@kazuyukitanimura can we close it?
Yes @comphead @vaibhawvipul