velox
velox copied to clipboard
Extend expression fuzzer test to support decimal
ArgumentTypeFuzzer could be used to generate argument types when constructing a decimal expression in the fuzzer test. However, given a result type, it is unable to produce argument types that satisfy the necessary constraints. To address this limitation, argument type generators for Presto and Spark decimal functions have been added. In this PR, ExpressionFuzzer takes a map from function name to an instance of the argument generator. Custom generators provided by Presto and Spark are used in the expression fuzzer test to generate argument types. Experimental fuzzer tests with decimal type enabled are added.
https://github.com/facebookincubator/velox/issues/1968
Deploy Preview for meta-velox canceled.
Name | Link |
---|---|
Latest commit | d13c41bdce459487b05250532e66b733b99bd95d |
Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/661cecf5b850cf00082999a0 |
Hi @mbasmanova, I drafted this PR to support decimal in expression fuzzer test. Tested decimal_add and decimal_between locally for 10 minutes each. Below are some next-steps I can think of.
- Verify other decimal functions as well (agg functions are tested in AggregationFuzzer, so they will not be covered in this PR).
- Fix CI issues and enable
velox_fuzzer_enable_decimal_type
in CI. - Test decimal function registered as simple function when the relating PR is merged.
Besides that, do you have more items in mind which I should take care of to move forward? Thank you.
This PR is updated and ready for review. Thanks.
@rui-mo Thank you for working on adding support for decimal. I'll try to review this PR by tomorrow.
@rui-mo It seems challenging to generate valid argument types given a signature and a return type for functions that operate on decimal types.
For example, decimal addition takes 2 decimals (p1, s1) and (p2, s2) and returns a decimal (p, s), where
p = max(p1 - s1, p2 - s2) + 1 + max(s1, s2)
s = max(s1, s2)
Given a return type decimal(5, 3), we need to come up with p1, s1, p2, s2 such that
5 = max(p1 - s1, p2 - s2) + 1 + max(s1, s2)
3 = max(s1, s2)
Solving these equations generically is not easy, but writing custom generator might be straightforward.
We could extend the Fuzzer to introduce custom input generators. A generator would take a function name, signature and concrete return type and produce a list of concrete argument types and optional constant argument values. The fuzzer will take this result and generate expressions for non-constant arguments.
We could extend the Fuzzer to introduce custom input generators. A generator would take a function name, signature and concrete return type and produce a list of concrete argument types
@rui-mo Rui, here is a sketch of what I have in mind: https://github.com/facebookincubator/velox/pull/9356
Let me know what you think.
@rui-mo Rui, I found that reversing the formulas for decimal results is tricky. Instead, I tried generating all possible decimal input types, compute result types and store these in a map keyed on result type. This mapping is then used to generate input types for a given result type: #9358
What do you think?
@rui-mo Rui, I also ran into a bug in DecimalType. It allows zero precision: DECIMAL(0, 0) works. This should not be allowed. Turns out kMinPrecision is 0 for short decimal, not 1.
Hi @mbasmanova, appreciate your great efforts on the decimal fuzzer test.
generating all possible decimal input types, compute result types and store these in a map keyed on result type.
Indeed this can solve the limitation of my current implementation. Thanks!
Turns out kMinPrecision is 0 for short decimal, not 1.
@mbasmanova Do we need to fix this in Velox? Thanks.
@rui-mo Rui, yes, we need to fix kMinPrecision. I'll work on a PR.
generating all possible decimal input types, compute result types and store these in a map keyed on result type.
@rui-mo I also tried doing that generically by evaluating constraints specified in the signature, but that was too slow. Presumably because for each pair of input types we need to parse the constraint formula, then evaluate. I'm not sure if there is a way to parse / compile the formula once and evaluate many time. @majetideepak Deepak, do you happen to know if that's possible?
we could continue make process on decimal support in the fuzzer: https://github.com/facebookincubator/velox/pull/9149
Hi @mbasmanova , as quoted from your comment, should I continue the decimal fuzzer work in this PR? Because I notice you have created https://github.com/facebookincubator/velox/pull/9358, I'm not sure if you are working it. If you want me to continue this work, I am glad to revise this PR following your great existing work. Thanks.
should I continue the decimal fuzzer work in this PR?
@rui-mo Rui, it would be great if you could continue this work. Let me know if you need any help from me. Thanks.
@mbasmanova Hi Masha, I'd like to break up this PR into several parts. Could you spare some time to review? Thanks!
I'd like to break up this PR into several parts.
@rui-mo I love this plan. Thanks. Should make it easier to review and safer to land.