velox icon indicating copy to clipboard operation
velox copied to clipboard

Add Decimal Between as scalar function

Open karteekmurthys opened this issue 2 years ago • 2 comments

VectorFunction registration provides means to specify a Decimal Type signature. Example: the DecimalArithemtic::Add has

exec::FunctionSignatureBuilder()
    .returnType("DECIMAL(r_precision, r_scale)")
    .argumentType("DECIMAL(a_precision, a_scale)")
    .argumentType("DECIMAL(b_precision, b_scale)")
    .variableConstraint(
        "r_precision",
        "min(38, max(a_precision - a_scale, b_precision - b_scale) + max(a_scale, b_scale) + 1)")
    .variableConstraint("r_scale", "max(a_scale, b_scale)")

However, scalar functions are registered using the CPP types and do not provide an API to specify a signature. The function signature is automatically deduced using TypeAnalysis in the registration code. Decimal types cannot be inferred from the CPP type alone due to missing precision and scale values. In this PR, we inject default signatures SHORT_DECIMAL(a_precision,a_scale), LONG_DECIMAL(a_precision,a_scale) for all decimal types in the TypeAnalysis phase. The semantics here will be that all arguments that are passed to a scalar function must have the same decimal precision, scale values. Any rescaling must be done explicitly using the decimal cast function. The Between function has been extended with this approach. Note that a default DECIMAL(a_precision,a_scale) signature cannot be used here since the signature binder needs to pick the right scalar function version (UnscaledShortDecimal vs UnscaledLongDecimal).

karteekmurthys avatar Aug 11 '22 23:08 karteekmurthys

Not able to reproduce the test failure on the docker image:

[ RUN      ] ComparisonsTest.betweenDecimal
../../velox/functions/prestosql/tests/ComparisonsTest.cpp:302: Failure
Expected equality of these values:
  actual->valueAt(i)
    Which is: true
  expectedResult[i]
    Which is: 0
[  FAILED  ] ComparisonsTest.betweenDecimal (0 ms)
[----------] 9 tests from ComparisonsTest (1 ms total)
Running main() from ../../third_party/googletest/googletest/src/gtest_main.cc
Note: Google Test filter = CastExprTest.decimalToDecimal
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from CastExprTest
[ RUN      ] CastExprTest.decimalToDecimal
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0816 19:50:40.830755  2083 Exceptions.h:68] Line: ../.././velox/type/DecimalUtil.h:65, Function:rescaleWithRoundUp, Expression: Cannot cast DECIMAL '-1000.000' to DECIMAL(6,4), Source: RUNTIME, ErrorCode: INVALID_STATE
[       OK ] CastExprTest.decimalToDecimal (1 ms)
[----------] 1 test from CastExprTest (1 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (5 ms total)
[  PASSED  ] 1 test.

karteekmurthys avatar Aug 16 '22 20:08 karteekmurthys

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 001762466222975e0de6a65d1d4b703d0f843aa6
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/632b5791f3b7900008466621

netlify[bot] avatar Aug 22 '22 23:08 netlify[bot]

@karteekmurthys also update the documentation for scalar-function registration with the decimal type semantics in the description. https://facebookincubator.github.io/velox/develop/scalar-functions.html#registration

majetideepak avatar Aug 31 '22 16:08 majetideepak

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Sep 09 '22 23:09 facebook-github-bot

@kagamiori just following up on this. Lmk if you need any help.

karteekmurthys avatar Sep 13 '22 16:09 karteekmurthys

@kagamiori just following up on this. Lmk if you need any help.

Hi Karteek, Thank you for reaching out. There are some internal test failures after importing this PR. I'm investigating those issues and will let you know if something needs to be changed here.

kagamiori avatar Sep 13 '22 18:09 kagamiori

Hi @karteekmurthys, I've looked into the internal test failure and here is what's going on.

A custom code retrieves all the registered Presto scalar functions, including the "between" function for Decimal introduced in this PR, and performs the following operations on each function signature.

for (const auto& arg : functionSignature->argumentTypes()) {
  // SignatureBinder::tryResolveType(arg, {}) converts a TypeSignature of "short_decimal(
  // a_precision, a_scale)" into a TypePtr of ScalarType<TypeKind::SHORT_DECIMAL>.
  if (auto resolvedType =
          velox::exec::SignatureBinder::tryResolveType(arg, {})) {
    args.emplace_back(resolvedType);
  } else {
    resolved = false;
    break;
  }
}
...
if (resolved) {
  // FunctionRegistry::resolveFunction(fnName, args) calls SignatureBinder::tryBind(
  // formalArgs[i], args[i]) to bind every pair of argument TypeSignature and TypePtr. 
  // SignatureBinder::tryBind() calls getDecimalPrecisionScale() on the TypePtr args[i]. 
  // getDecimalPrecisionScale() assumes the input type is a 
  // DecimalType<TypeKind::SHORT_DECIMAL> and does the casting type.asShortDecimal(). 
  // However, the TypePtr got from the code above is ScalarType<TypeKind::SHORT_DECIMAL>, 
  // so a bad dynamic_cast happens.
  auto fn = velox::exec::SimpleFunctions().resolveFunction(fnName, args)...
}

We think SignatureBinder::tryResolveType() should return a nullptr for "short_decimal(a_precision, a_scale)" unless a_precision and a_scale has been provided. I'm wondering if you're willing to make this fix?

kagamiori avatar Sep 14 '22 00:09 kagamiori

/ SignatureBinder::tryResolveType(arg, {}) converts a TypeSignature of "short_decimal( // a_precision, a_scale)" into a TypePtr of ScalarTypeTypeKind::SHORT_DECIMAL.

~~This is correct because DecimalType<TypeKind> is a ScalarType (inherits). See: https://github.com/facebookincubator/velox/blob/main/velox/type/Type.h#L621~~

~~Here, creating a ScalarTypeTypekind::SHORT_DECIMAL and dynamic_cast to DecimalType, it works:~~

std::shared_ptr<ScalarType<TypeKind::SHORT_DECIMAL>> ret = std::make_shared<ShortDecimalType>(10, 5);
    auto& decimalType = ret->asShortDecimal();
Screen Shot 2022-09-13 at 10 11 07 PM

~~In the above snapshot, decimalType is a valid object.~~

~~Lmk if I missed any step to ~~reproduce~~ the issue you mentioned.~~

Ignore the above message: I am able to reproduce the issue here, this seems to happen if precision and scale variables are not bound.

auto ret = exec::SignatureBinder::tryResolveType(shortSignature->argumentTypes()[0], {});
 auto& decimalType = ret->asShortDecimal();

karteekmurthys avatar Sep 14 '22 05:09 karteekmurthys

for (const auto& arg : functionSignature->argumentTypes()) { // SignatureBinder::tryResolveType(arg, {}) converts a TypeSignature of "short_decimal( // a_precision, a_scale)" into a TypePtr of ScalarTypeTypeKind::SHORT_DECIMAL. if (auto resolvedType = velox::exec::SignatureBinder::tryResolveType(arg, {})) { args.emplace_back(resolvedType); } else { resolved = false; break; } } ... if (resolved) { // FunctionRegistry::resolveFunction(fnName, args) calls SignatureBinder::tryBind( // formalArgs[i], args[i]) to bind every pair of argument TypeSignature and TypePtr. // SignatureBinder::tryBind() calls getDecimalPrecisionScale() on the TypePtr args[i]. // getDecimalPrecisionScale() assumes the input type is a // DecimalTypeTypeKind::SHORT_DECIMAL and does the casting type.asShortDecimal(). // However, the TypePtr got from the code above is ScalarTypeTypeKind::SHORT_DECIMAL, // so a bad dynamic_cast happens. auto fn = velox::exec::SimpleFunctions().resolveFunction(fnName, args)... }

tryResolveType in above case would invoke this method to create ShortDecimalType obj and ends up being a Scalar Type.

template <TypeKind KIND>
std::shared_ptr<const Type> createType(
    std::vector<std::shared_ptr<const Type>>&& children) {
  if (children.size() != 0) {
    throw std::invalid_argument{
        std::string(TypeTraits<KIND>::name) +
        " primitive type takes no children"};
  }
  static_assert(TypeTraits<KIND>::isPrimitiveType);
  return ScalarType<KIND>::create(); 
}

We have 2 options:

  1. Return nullptr (if you can confirm, it is ok in the above case if the args are not resolved).
  2. Specialize createType template to return SHORT_DECIMAL type object with default prec, scale.

@majetideepak @mbasmanova

karteekmurthys avatar Sep 14 '22 05:09 karteekmurthys

Return nullptr (if you can confirm, it is ok in the above case if the args are not resolved).

That should work. The method is already defined as best effort:

  // Returns concrete return type or null if couldn't fully resolve.
  static TypePtr tryResolveType(

mbasmanova avatar Sep 14 '22 10:09 mbasmanova

Return nullptr (if you can confirm, it is ok in the above case if the args are not resolved).

That should work. The method is already defined as best effort:

  // Returns concrete return type or null if couldn't fully resolve.
  static TypePtr tryResolveType(

Addressed this by defaulting to null if TypeSignature has ShortDecimal or LongDecimal. Today, we don't have any functions whose return type is Short/Long decimal. For Decimal arithmetic, we use generic DECIMAL(a, b) signature format.

karteekmurthys avatar Sep 14 '22 16:09 karteekmurthys

Hi @karteekmurthys, I believe https://github.com/facebookincubator/velox/pull/2544 has addressed that internal failure. Could you try rebasing this PR onto the latest main?

kagamiori avatar Sep 20 '22 00:09 kagamiori

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Sep 20 '22 05:09 facebook-github-bot

FYI, there are some internal failures (different from those before). I'm rebasing the diff internally in case they were caused by version control.

kagamiori avatar Sep 20 '22 16:09 kagamiori

Hi @karteekmurthys, I rebased the diff onto the latest code and I'm seeing an error of velox_functions_test with the following stack trace:

Line: velox/type/Type.cpp:101, Function:mapNameToTypeKind, Expression: Specified element is not found : A_PRECISION, Source: USER, ErrorCode: INVALID_ARGUMENT
terminate called after throwing an instance of 'velox::VeloxUserError'
  what():  Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Specified element is not found : A_PRECISION
Retriable: False
Function: mapNameToTypeKind
File: velox/type/Type.cpp
Line: 101
Stack trace:
…
velox::detail::veloxCheckFail<velox::VeloxUserError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                       ./velox/common/base/Exceptions.h:74

velox::mapNameToTypeKind(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                       ./velox/type/Type.cpp:101
    
velox::exec::(anonymous namespace)::validateBaseTypeAndCollectTypeParams(…)
                       ./velox/expression/FunctionSignature.cpp:132

velox::exec::(anonymous namespace)::validateBaseTypeAndCollectTypeParams(…)
                       ./velox/expression/FunctionSignature.cpp:137

velox::exec::(anonymous namespace)::validate(…)
                       ./velox/expression/FunctionSignature.cpp:169
    
velox::exec::FunctionSignature::FunctionSignature(std::vector<velox::exec::TypeVariableConstraint, std::allocator<velox::exec::TypeVariableConstraint> >, velox::exec::TypeSignature, std::vector<velox::exec::TypeSignature, std::allocator<velox::exec::TypeSignature> >, bool)
                       ./velox/expression/FunctionSignature.cpp:205

…
velox::functions::registerComparisonFunctions()
                       ./velox/functions/prestosql/registration/ComparisonFunctionsRegistration.cpp:57

On the other hand, I do see all CircleCI tests are passed. Could you double check to make sure this PR has been rebased onto the latest code? Thanks!

kagamiori avatar Sep 20 '22 17:09 kagamiori

@kagamiori thanks for trying. I will rebase again and test it out.

karteekmurthys avatar Sep 20 '22 17:09 karteekmurthys

@kagamiori Rebase is good. Are you sure you didn't miss any changes from this PR when you rebased on your end. The error that you saw happens if velox/core/SimpleFunctionMetadata.h file changes are not picked up.

karteekmurthys avatar Sep 20 '22 19:09 karteekmurthys

@kagamiori Rebase is good. Are you sure you didn't miss any changes from this PR when you rebased on your end. The error that you saw happens if velox/core/SimpleFunctionMetadata.h file changes are not picked up.

Thanks for double checking! I did made sure all changes of this PR were included. Let me give it another try.

kagamiori avatar Sep 21 '22 01:09 kagamiori

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Sep 21 '22 01:09 facebook-github-bot

@kagamiori is it the same test failure?

karteekmurthys avatar Sep 21 '22 16:09 karteekmurthys

@kagamiori is it the same test failure?

No, those are warnings for tests. There are two linter warnings about the definition of kPrecisionVariable and kScaleVariable in SimpleFunctionMetadata.h though:

Using namespace-level static in a .h file is near-always a bad idea: for simple types, static is implied; for aggregate types, static linkage means that a copy of the aggregate will appear in every compilation unit that includes the header; and for functions, static means the function will have a separate body generated in each compilation unit including the header.

Variables should not be defined in headers.
Taking the address of a non-inline header-defined variable in two or more translation units is an ODR violation.

Might it be possible to avoid these warnings?

kagamiori avatar Sep 21 '22 17:09 kagamiori

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Sep 21 '22 18:09 facebook-github-bot

I am inlined the global constants. Hopefully this fixes the issue. Some references I found on this: https://stackoverflow.com/questions/50488831/use-of-constexpr-in-header-file

karteekmurthys avatar Sep 21 '22 18:09 karteekmurthys

I am inlined the global constants. Hopefully this fixes the issue. Some references I found on this: https://stackoverflow.com/questions/50488831/use-of-constexpr-in-header-file

Yep, it worked. Landing the diff now. Thank you very much for iterating on this PR!

kagamiori avatar Sep 21 '22 23:09 kagamiori