datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

feat: ANSI support for Add

Open planga82 opened this issue 1 year ago • 6 comments

Which issue does this PR close?

Closes #536 .

Rationale for this change

This PR adds ANSI support for Add operator. This is done by adding a wrapper to BinaryExpr to add the different behavior between Spark and Datafusion. The wrapper is based on #593 because both PRs solve similar problems.

What changes are included in this PR?

The implementation is based on Java Math.addExact(a, b) because it is the function that uses Spark to solve this problem, but in this case using datafusion operators.

 public static int addExact(int x, int y) {
        // HD 2-12 Overflow iff both arguments have the opposite sign of the result
         int r = x + y;
         if (((x ^ r) & (y ^ r)) < 0) {
            throw new ArithmeticException("integer overflow");
         }
         return r;
    }

This PR excludes two things that I will do in subsequent PRs to avoid make this PR more complex:

  • Support for DecimalType overflow check
  • Spark try_add mode

How are these changes tested?

Unit testing

planga82 avatar Jul 01 '24 12:07 planga82

@dharanad Here is my solution based on your PR.

planga82 avatar Jul 01 '24 12:07 planga82

@dharanad Here is my solution based on your PR.

This look fine, once this PR is merged. I will extend my solution to this to solve #535 . @andygrove / @viirya Can you please help us with a review

dharanad avatar Jul 02 '24 06:07 dharanad

Thanks for the contribution @planga82. I am reviewing this today.

andygrove avatar Jul 09 '24 14:07 andygrove

@planga82 Please see https://github.com/apache/datafusion/pull/11400 that I just created against DataFusion, which possibly gives us most of what we need, although the same principals could be applied directly in Comet.

They key change was adding a fail_on_overflow config in BinaryExpr and then choosing to call either add or add_wrapped depending on that value.

andygrove avatar Jul 10 '24 19:07 andygrove

@planga82 Please see apache/datafusion# v that I just created against DataFusion, which possibly gives us most of what we need, although the same principals could be applied directly in Comet.

They key change was adding a fail_on_overflow config in BinaryExpr and then choosing to call either add or add_wrapped depending on that value.

This is amazing, thanks for https://github.com/apache/datafusion/pull/11400 . I have considered this idea, but I wasn't sure how to suggest this change.

Correct me if i am wrong, so this change will go in the next datafusion release and we are blocked on supporting ANSI until then ? Or can we plan to update datafusion dep updated once 11400 is merged ?

dharanad avatar Jul 11 '24 03:07 dharanad

Thank you very much @andygrove for the explanation and the PR. In addition to @dharanad question

In Spark we have three different behaviors

ANSI mode disable --> Same behavior as Datafusion, return overflow value ANSI mode enabled

  • x + y : Fail with overflow message
  • try_add(x, y): Return null value on overflow

Should we implement this try_add behavior in Datafusion?

planga82 avatar Jul 11 '24 12:07 planga82

I am closing this PR since it has not been active lately. Feel free to re-open if needed.

andygrove avatar Jun 10 '25 21:06 andygrove