feat: ANSI support for Add
Which issue does this PR close?
Closes #536 .
Rationale for this change
This PR adds ANSI support for Add operator. This is done by adding a wrapper to BinaryExpr to add the different behavior between Spark and Datafusion. The wrapper is based on #593 because both PRs solve similar problems.
What changes are included in this PR?
The implementation is based on Java Math.addExact(a, b) because it is the function that uses Spark to solve this problem, but in this case using datafusion operators.
public static int addExact(int x, int y) {
// HD 2-12 Overflow iff both arguments have the opposite sign of the result
int r = x + y;
if (((x ^ r) & (y ^ r)) < 0) {
throw new ArithmeticException("integer overflow");
}
return r;
}
This PR excludes two things that I will do in subsequent PRs to avoid make this PR more complex:
- Support for DecimalType overflow check
- Spark try_add mode
How are these changes tested?
Unit testing
@dharanad Here is my solution based on your PR.
@dharanad Here is my solution based on your PR.
This look fine, once this PR is merged. I will extend my solution to this to solve #535 . @andygrove / @viirya Can you please help us with a review
Thanks for the contribution @planga82. I am reviewing this today.
@planga82 Please see https://github.com/apache/datafusion/pull/11400 that I just created against DataFusion, which possibly gives us most of what we need, although the same principals could be applied directly in Comet.
They key change was adding a fail_on_overflow config in BinaryExpr and then choosing to call either add or add_wrapped depending on that value.
@planga82 Please see apache/datafusion# v that I just created against DataFusion, which possibly gives us most of what we need, although the same principals could be applied directly in Comet.
They key change was adding a
fail_on_overflowconfig inBinaryExprand then choosing to call eitheraddoradd_wrappeddepending on that value.
This is amazing, thanks for https://github.com/apache/datafusion/pull/11400 . I have considered this idea, but I wasn't sure how to suggest this change.
Correct me if i am wrong, so this change will go in the next datafusion release and we are blocked on supporting ANSI until then ? Or can we plan to update datafusion dep updated once 11400 is merged ?
Thank you very much @andygrove for the explanation and the PR. In addition to @dharanad question
In Spark we have three different behaviors
ANSI mode disable --> Same behavior as Datafusion, return overflow value ANSI mode enabled
- x + y : Fail with overflow message
- try_add(x, y): Return null value on overflow
Should we implement this try_add behavior in Datafusion?
I am closing this PR since it has not been active lately. Feel free to re-open if needed.