velox icon indicating copy to clipboard operation
velox copied to clipboard

Adding decimal support for min() and max() functions

Open minhancao opened this issue 11 months ago • 5 comments

Delivers https://github.com/facebookincubator/velox/issues/9004

This PR is for adding decimal support for min() and max() functions.

presto-cli output:

presto:tpch> SELECT MIN(Col,2) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
      _col0       
------------------
 [0.8200, 2.3330] 
(1 row)
presto:tpch> SELECT MIN(Col,3) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
          _col0           
--------------------------
 [0.8200, 2.3330, 3.1320] 
(1 row)
presto:tpch> SELECT MAX(Col,2) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
      _col0       
------------------
 [4.3440, 3.1320] 
(1 row)
presto:tpch> SELECT MAX(Col,3) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
          _col0           
--------------------------
 [4.3440, 3.1320, 2.3330] 
(1 row)

minhancao avatar Mar 07 '24 23:03 minhancao

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 8abad4896ee1465b526b3337406adaccfc4c0fb3
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66280a4fae26f100087498f1

netlify[bot] avatar Mar 07 '24 23:03 netlify[bot]

@minhancao The linux-adapters job failed. This is likely an alignment issue.

 int32_t accumulatorAlignmentSize() const override {
    return 1;
  }

Please override this method:

 int32_t accumulatorAlignmentSize() const override {
    return 1; --> must return size of int128_t in case of long decimals.
  }

karteekmurthys avatar Mar 08 '24 00:03 karteekmurthys

@karteekmurthys I see this piece of code on line 99 of MinMaxAggregates.cpp, would this already have taken care of it?

/// Override 'accumulatorAlignmentSize' for UnscaledLongDecimal values as it
/// uses int128_t type. Some CPUs don't support misaligned access to int128_t
/// type.
template <>
inline int32_t MinMaxAggregate<int128_t>::accumulatorAlignmentSize() const {
  return static_cast<int32_t>(sizeof(int128_t));
}

minhancao avatar Mar 08 '24 01:03 minhancao

Also, the accumulator internally uses a heap for sorting. This heap allocator must be an aligned allocator:

template <typename T, typename Compare>
struct MinMaxNAccumulator {
  int64_t n{0};
  std::vector<T, StlAllocator<T>> heapValues;

  explicit MinMaxNAccumulator(HashStringAllocator* allocator)
      : heapValues{StlAllocator<T>(allocator)} {}

  int64_t getN() const {
    return n;
  }

  size_t size() const {
    return heapValues.size();
  }

Refer this:https://github.com/facebookincubator/velox/pull/8723/files#diff-5091b1dd8232c357067be47fb0e751772f91a0aa12dea5db835a2d4ae296a63cR93

karteekmurthys avatar Mar 11 '24 17:03 karteekmurthys

Hi @karteekmurthys, I have made the change that you have requested, please review when you can. thanks!

minhancao avatar Mar 27 '24 16:03 minhancao

@minhancao : Looks good. Thanks for contributing !

aditi-pandit avatar Apr 04 '24 22:04 aditi-pandit

@Yuhta would you please review this?

karteekmurthys avatar Apr 11 '24 20:04 karteekmurthys

Hi @Yuhta , can you review this PR when you can please? Thanks!

minhancao avatar Apr 22 '24 20:04 minhancao

@Yuhta I have ran locally the velox aggregation fuzzer test: min() function: command ran:

_build/debug/velox/functions/prestosql/fuzzer/velox_aggregation_fuzzer_test --seed 123456 --duration_sec 1200 --logtostderr=1 --minloglevel=0 --only min --v=1 2>&1 | tee agg_fuzzer_test_min_20mins.out

output:

I20240422 15:00:08.498713 3874823 AggregationFuzzer.cpp:456] ==============================> Done with iteration 442
I20240422 15:00:08.498742 3874823 AggregationFuzzer.cpp:993] Total masked aggregations: 67 (15.12%)
I20240422 15:00:08.498764 3874823 AggregationFuzzer.cpp:995] Total global aggregations: 33 (7.45%)
I20240422 15:00:08.498781 3874823 AggregationFuzzer.cpp:997] Total group-by aggregations: 325 (73.36%)
I20240422 15:00:08.498798 3874823 AggregationFuzzer.cpp:999] Total distinct aggregations: 46 (10.38%)
I20240422 15:00:08.498817 3874823 AggregationFuzzer.cpp:1001] Total aggregations over distinct inputs: 69 (15.58%)
I20240422 15:00:08.498834 3874823 AggregationFuzzer.cpp:1003] Total window expressions: 39 (8.80%)
I20240422 15:00:08.498852 3874823 AggregationFuzzerBase.cpp:647] Total functions tested: 1
I20240422 15:00:08.498868 3874823 AggregationFuzzerBase.cpp:648] Total iterations requiring sorted inputs: 71 (16.03%)
I20240422 15:00:08.498883 3874823 AggregationFuzzerBase.cpp:650] Total iterations verified against reference DB: 41 (9.26%)
I20240422 15:00:08.498899 3874823 AggregationFuzzerBase.cpp:652] Total functions not verified (verification skipped / not supported by reference DB / reference DB failed): 39 (8.80%) / 209 (47.18%) / 125 (28.22%)
I20240422 15:00:08.498917 3874823 AggregationFuzzerBase.cpp:657] Total failed functions: 32 (7.22%)
[==========] Running 0 tests from 0 test suites.
[==========] 0 tests from 0 test suites ran. (0 ms total)
[  PASSED  ] 0 tests.

max() function: command ran:

_build/debug/velox/functions/prestosql/fuzzer/velox_aggregation_fuzzer_test --seed 123456 --duration_sec 1200 --logtostderr=1 --minloglevel=0 --only max --v=1 2>&1 | tee agg_fuzzer_test_max_20mins.out

output:

I20240422 15:59:21.033170 3979822 AggregationFuzzer.cpp:456] ==============================> Done with iteration 451
I20240422 15:59:21.033193 3979822 AggregationFuzzer.cpp:993] Total masked aggregations: 68 (15.04%)
I20240422 15:59:21.033223 3979822 AggregationFuzzer.cpp:995] Total global aggregations: 34 (7.52%)
I20240422 15:59:21.033241 3979822 AggregationFuzzer.cpp:997] Total group-by aggregations: 333 (73.67%)
I20240422 15:59:21.033258 3979822 AggregationFuzzer.cpp:999] Total distinct aggregations: 46 (10.18%)
I20240422 15:59:21.033274 3979822 AggregationFuzzer.cpp:1001] Total aggregations over distinct inputs: 70 (15.49%)
I20240422 15:59:21.033290 3979822 AggregationFuzzer.cpp:1003] Total window expressions: 39 (8.63%)
I20240422 15:59:21.033306 3979822 AggregationFuzzerBase.cpp:647] Total functions tested: 1
I20240422 15:59:21.033322 3979822 AggregationFuzzerBase.cpp:648] Total iterations requiring sorted inputs: 74 (16.37%)
I20240422 15:59:21.033339 3979822 AggregationFuzzerBase.cpp:650] Total iterations verified against reference DB: 41 (9.07%)
I20240422 15:59:21.033355 3979822 AggregationFuzzerBase.cpp:652] Total functions not verified (verification skipped / not supported by reference DB / reference DB failed): 39 (8.63%) / 217 (48.01%) / 126 (27.88%)
I20240422 15:59:21.033372 3979822 AggregationFuzzerBase.cpp:657] Total failed functions: 32 (7.08%)
[==========] Running 0 tests from 0 test suites.
[==========] 0 tests from 0 test suites ran. (0 ms total)
[  PASSED  ] 0 tests.

minhancao avatar Apr 22 '24 23:04 minhancao

The branch seems messed up

Yuhta avatar Apr 23 '24 14:04 Yuhta

@Yuhta Fixed the branch

minhancao avatar Apr 23 '24 19:04 minhancao

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Apr 23 '24 20:04 facebook-github-bot

@Yuhta merged this pull request in facebookincubator/velox@d97ddb410c8f3a957425bbfd91edbe7f016300de.

facebook-github-bot avatar Apr 25 '24 15:04 facebook-github-bot

Conbench analyzed the 1 benchmark run on commit d97ddb41.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

conbench-facebook[bot] avatar Apr 25 '24 15:04 conbench-facebook[bot]