velox icon indicating copy to clipboard operation
velox copied to clipboard

Extend expression fuzzer test to support decimal

Open rui-mo opened this issue 11 months ago • 17 comments

ArgumentTypeFuzzer could be used to generate argument types when constructing a decimal expression in the fuzzer test. However, given a result type, it is unable to produce argument types that satisfy the necessary constraints. To address this limitation, argument type generators for Presto and Spark decimal functions have been added. In this PR, ExpressionFuzzer takes a map from function name to an instance of the argument generator. Custom generators provided by Presto and Spark are used in the expression fuzzer test to generate argument types. Experimental fuzzer tests with decimal type enabled are added.

https://github.com/facebookincubator/velox/issues/1968

rui-mo avatar Mar 19 '24 07:03 rui-mo

Deploy Preview for meta-velox canceled.

Name Link
Latest commit d13c41bdce459487b05250532e66b733b99bd95d
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/661cecf5b850cf00082999a0

netlify[bot] avatar Mar 19 '24 07:03 netlify[bot]

Hi @mbasmanova, I drafted this PR to support decimal in expression fuzzer test. Tested decimal_add and decimal_between locally for 10 minutes each. Below are some next-steps I can think of.

  • Verify other decimal functions as well (agg functions are tested in AggregationFuzzer, so they will not be covered in this PR).
  • Fix CI issues and enable velox_fuzzer_enable_decimal_type in CI.
  • Test decimal function registered as simple function when the relating PR is merged.

Besides that, do you have more items in mind which I should take care of to move forward? Thank you.

rui-mo avatar Mar 19 '24 08:03 rui-mo

This PR is updated and ready for review. Thanks.

rui-mo avatar Mar 26 '24 13:03 rui-mo

@rui-mo Thank you for working on adding support for decimal. I'll try to review this PR by tomorrow.

bikramSingh91 avatar Apr 01 '24 17:04 bikramSingh91

@rui-mo It seems challenging to generate valid argument types given a signature and a return type for functions that operate on decimal types.

For example, decimal addition takes 2 decimals (p1, s1) and (p2, s2) and returns a decimal (p, s), where

	p = max(p1 - s1, p2 - s2) + 1 + max(s1, s2)
	s = max(s1, s2)

Given a return type decimal(5, 3), we need to come up with p1, s1, p2, s2 such that

	5 = max(p1 - s1, p2 - s2) + 1 + max(s1, s2)
	3 = max(s1, s2)

Solving these equations generically is not easy, but writing custom generator might be straightforward.

We could extend the Fuzzer to introduce custom input generators. A generator would take a function name, signature and concrete return type and produce a list of concrete argument types and optional constant argument values. The fuzzer will take this result and generate expressions for non-constant arguments.

mbasmanova avatar Apr 03 '24 12:04 mbasmanova

We could extend the Fuzzer to introduce custom input generators. A generator would take a function name, signature and concrete return type and produce a list of concrete argument types

@rui-mo Rui, here is a sketch of what I have in mind: https://github.com/facebookincubator/velox/pull/9356

Let me know what you think.

mbasmanova avatar Apr 03 '24 15:04 mbasmanova

@rui-mo Rui, I found that reversing the formulas for decimal results is tricky. Instead, I tried generating all possible decimal input types, compute result types and store these in a map keyed on result type. This mapping is then used to generate input types for a given result type: #9358

What do you think?

mbasmanova avatar Apr 04 '24 01:04 mbasmanova

@rui-mo Rui, I also ran into a bug in DecimalType. It allows zero precision: DECIMAL(0, 0) works. This should not be allowed. Turns out kMinPrecision is 0 for short decimal, not 1.

mbasmanova avatar Apr 04 '24 01:04 mbasmanova

Hi @mbasmanova, appreciate your great efforts on the decimal fuzzer test.

generating all possible decimal input types, compute result types and store these in a map keyed on result type.

Indeed this can solve the limitation of my current implementation. Thanks!

rui-mo avatar Apr 04 '24 02:04 rui-mo

Turns out kMinPrecision is 0 for short decimal, not 1.

@mbasmanova Do we need to fix this in Velox? Thanks.

rui-mo avatar Apr 04 '24 02:04 rui-mo

@rui-mo Rui, yes, we need to fix kMinPrecision. I'll work on a PR.

mbasmanova avatar Apr 04 '24 11:04 mbasmanova

generating all possible decimal input types, compute result types and store these in a map keyed on result type.

@rui-mo I also tried doing that generically by evaluating constraints specified in the signature, but that was too slow. Presumably because for each pair of input types we need to parse the constraint formula, then evaluate. I'm not sure if there is a way to parse / compile the formula once and evaluate many time. @majetideepak Deepak, do you happen to know if that's possible?

mbasmanova avatar Apr 04 '24 11:04 mbasmanova

we could continue make process on decimal support in the fuzzer: https://github.com/facebookincubator/velox/pull/9149

Hi @mbasmanova , as quoted from your comment, should I continue the decimal fuzzer work in this PR? Because I notice you have created https://github.com/facebookincubator/velox/pull/9358, I'm not sure if you are working it. If you want me to continue this work, I am glad to revise this PR following your great existing work. Thanks.

rui-mo avatar Apr 11 '24 02:04 rui-mo

should I continue the decimal fuzzer work in this PR?

@rui-mo Rui, it would be great if you could continue this work. Let me know if you need any help from me. Thanks.

mbasmanova avatar Apr 11 '24 05:04 mbasmanova

@mbasmanova Hi Masha, I'd like to break up this PR into several parts. Could you spare some time to review? Thanks!

  1. To fix the issues found through running fuzzer test, including #9631 and #9632.
  2. To add the base class of arg generator with #9634.
  3. To add customized arg generators for Presto and Spark functions.
  4. To make the decimal fuzzer test work with these arg generators.
  5. Others to be updated when needed.

rui-mo avatar Apr 26 '24 08:04 rui-mo

I'd like to break up this PR into several parts.

@rui-mo I love this plan. Thanks. Should make it easier to review and safer to land.

mbasmanova avatar Apr 26 '24 11:04 mbasmanova