arrow
arrow copied to clipboard
[C++][Gandiva] A question about Gandiva's GetResultType for DecimalType
Describe the bug, including details regarding any error messages, version, and platform.
Below codes calculate the decimal type's precision and scale.
https://github.com/apache/arrow/blob/30e6d72242e376baa598b2e8f1d9b80d800a974c/cpp/src/gandiva/decimal_type_util.cc#L62
It seems different from our Redshift’s decimal promotion rules, iss there any relevant background here?
scale = max(4, s1 + p2 - s2 + 1)
precision = p1 - s1 + s2 + scale
It may cause the decimal divide results different from normal expression calculate.
Component(s)
C++ - Gandiva
@pravindra Could you take a look at this?
I think it's a problem and we should fix it. Below codes could prove this problem:
// Normal expression
TEST(DecimalTest, Divide) {
auto expr = arrow::compute::call("divide", {arrow::compute::field_ref("f1"),
arrow::compute::field_ref("f2")});
auto s = arrow::schema({arrow::field("f1", arrow::decimal128(11, 3)),
arrow::field("f2", arrow::decimal128(20, 9))});
ASSERT_OK_AND_ASSIGN(expr, expr.Bind(*s));
arrow::AssertTypeEqual(*expr.type(), *arrow::decimal128(32, 15));
auto b = arrow::RecordBatchFromJSON(s, R"([
["1.982", "1.000000001"]
])");
ASSERT_OK_AND_ASSIGN(auto input, arrow::compute::MakeExecBatch(*s, b));
ASSERT_OK_AND_ASSIGN(auto res,
arrow::compute::ExecuteScalarExpression(expr, input));
arrow::AssertArraysEqual(
*res.make_array(),
*arrow::ArrayFromJSON(arrow::decimal128(32, 15), R"(["1.981999998018000"])"));
}
// gandiva test
TEST_F(TestDecimalOps, TestDivide) {
DivideAndVerify(decimal_literal("1982", 11, 3),
decimal_literal("1000000001", 20, 9),
decimal_literal("1981999998018000001982", 38, 21));
}
We will have different precision and scale for the same data.
Below codes could prove this problem:
I'm not sure what is the important part in the codes. Could you explain it?
our Redshift’s decimal promotion rules
Could you show the URL that describes the rule?
is there any relevant background here?
Based on the kMinAdjustedScale
comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:
https://github.com/apache/arrow/blob/6b1e254f3b62924f216e06e9e563e92c69f9efd3/cpp/src/gandiva/decimal_type_util.h#L54-L58
I'm not sure what is the important part in the codes. Could you explain it?
The test codes are the decimal divide operation in compute expression and gandiva expression. We input the save decimal type (11, 3), (20, 9) with same input arr-data, but the results are different.
For compute expression, divide result's decimal type is (32,15); but gandiva expression's result is (38, 21).
Could you show the URL that describes the rule?
Below rules is our compute expression's rules: https://github.com/apache/arrow/blob/9f0a28f61a068059a6e53f25a3ffcb1689d701ec/cpp/src/arrow/compute/kernels/codegen_internal.cc#L421
And the compute doc: https://github.com/apache/arrow/blob/4fe364efa4be98b35964509e0e3d57a421a48039/docs/source/cpp/compute.rst#L501-L520.
Based on the
kMinAdjustedScale
comment, it seems that the current precision doesn't have strong reason but it's compatible with SQLServer and Impala:
Yes, but maybe we should unify these rules between compute expression and gandiva expression. Or the results will be different when user do optimize expression with gandiva?
Ah, I understand. You referred our compute module's behavior by "our Redshift’s decimal promotion rules". I thought that you refer Redshift's rules not our compute module's one.
I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?
I thought that you refer Redshift's rules not our compute module's one.
Actually our compute module's decimal rules is the same with Redshift's rules. Compute module implement the rules by two step:
- Implicit cast. The logic is in
CastBinaryDecimalArgs
. - Output resolve. The logic is in
ResolveDecimalDivisionOutput
.
All these two steps are in expression Bind
, and form the rules same with Redshift's numeric rules
.
I don't object that Gandiva can also use the rules but it breaks backward compatibility. Can we support both rules and use the current rule by default for compatibility?
Agree.
@kou Added the compatibility in the pr. PTAL?
Issue resolved by pull request 40434 https://github.com/apache/arrow/pull/40434