arrow icon indicating copy to clipboard operation
arrow copied to clipboard

[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType

Open ZhangHuiGui opened this issue 11 months ago • 2 comments

Describe the bug, including details regarding any error messages, version, and platform.

Below codes calculate the decimal type's precision and scale.

https://github.com/apache/arrow/blob/30e6d72242e376baa598b2e8f1d9b80d800a974c/cpp/src/gandiva/decimal_type_util.cc#L62

It seems different from our Redshift’s decimal promotion rules, iss there any relevant background here?

scale = max(4, s1 + p2 - s2 + 1)
precision = p1 - s1 + s2 + scale

It may cause the decimal divide results different from normal expression calculate.

Component(s)

C++ - Gandiva

ZhangHuiGui avatar Mar 01 '24 14:03 ZhangHuiGui

@pravindra Could you take a look at this?

kou avatar Mar 03 '24 08:03 kou

I think it's a problem and we should fix it. Below codes could prove this problem:

// Normal expression
TEST(DecimalTest, Divide) {
  auto expr = arrow::compute::call("divide", {arrow::compute::field_ref("f1"),
                                              arrow::compute::field_ref("f2")});
  auto s = arrow::schema({arrow::field("f1", arrow::decimal128(11, 3)),
                          arrow::field("f2", arrow::decimal128(20, 9))});
  ASSERT_OK_AND_ASSIGN(expr, expr.Bind(*s));
  arrow::AssertTypeEqual(*expr.type(), *arrow::decimal128(32, 15));

  auto b = arrow::RecordBatchFromJSON(s, R"([
    ["1.982", "1.000000001"]
   ])");
  ASSERT_OK_AND_ASSIGN(auto input, arrow::compute::MakeExecBatch(*s, b));
  ASSERT_OK_AND_ASSIGN(auto res,
                       arrow::compute::ExecuteScalarExpression(expr, input));
  arrow::AssertArraysEqual(
      *res.make_array(),
      *arrow::ArrayFromJSON(arrow::decimal128(32, 15), R"(["1.981999998018000"])"));
}

// gandiva test
TEST_F(TestDecimalOps, TestDivide) {
  DivideAndVerify(decimal_literal("1982", 11, 3),
                  decimal_literal("1000000001", 20, 9),
                  decimal_literal("1981999998018000001982", 38, 21));
}

We will have different precision and scale for the same data.

ZhangHuiGui avatar Mar 09 '24 09:03 ZhangHuiGui

Below codes could prove this problem:

I'm not sure what is the important part in the codes. Could you explain it?

our Redshift’s decimal promotion rules

Could you show the URL that describes the rule?

is there any relevant background here?

Based on the kMinAdjustedScale comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:

https://github.com/apache/arrow/blob/6b1e254f3b62924f216e06e9e563e92c69f9efd3/cpp/src/gandiva/decimal_type_util.h#L54-L58

kou avatar Mar 14 '24 03:03 kou

I'm not sure what is the important part in the codes. Could you explain it?

The test codes are the decimal divide operation in compute expression and gandiva expression. We input the save decimal type (11, 3), (20, 9) with same input arr-data, but the results are different.

For compute expression, divide result's decimal type is (32,15); but gandiva expression's result is (38, 21).

Could you show the URL that describes the rule?

Below rules is our compute expression's rules: https://github.com/apache/arrow/blob/9f0a28f61a068059a6e53f25a3ffcb1689d701ec/cpp/src/arrow/compute/kernels/codegen_internal.cc#L421

And the compute doc: https://github.com/apache/arrow/blob/4fe364efa4be98b35964509e0e3d57a421a48039/docs/source/cpp/compute.rst#L501-L520.

Based on the kMinAdjustedScale comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:

Yes, but maybe we should unify these rules between compute expression and gandiva expression. Or the results will be different when user do optimize expression with gandiva?

ZhangHuiGui avatar Mar 14 '24 03:03 ZhangHuiGui

Ah, I understand. You referred our compute module's behavior by "our Redshift’s decimal promotion rules". I thought that you refer Redshift's rules not our compute module's one.

I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?

kou avatar Mar 14 '24 04:03 kou

I thought that you refer Redshift's rules not our compute module's one.

Actually our compute module's decimal rules is the same with Redshift's rules. Compute module implement the rules by two step:

  1. Implicit cast. The logic is in CastBinaryDecimalArgs.
  2. Output resolve. The logic is in ResolveDecimalDivisionOutput.

All these two steps are in expression Bind, and form the rules same with Redshift's numeric rules.

I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?

Agree.

ZhangHuiGui avatar Mar 14 '24 06:03 ZhangHuiGui

@kou Added the compatibility in the pr. PTAL?

ZhangHuiGui avatar Mar 14 '24 14:03 ZhangHuiGui

Issue resolved by pull request 40434 https://github.com/apache/arrow/pull/40434

kou avatar Mar 25 '24 05:03 kou