datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Improve performance of Spark-compatible decimal aggregates

Open andygrove opened this issue 1 year ago • 1 comments

What is the problem the feature request solves?

The benchmarks added in https://github.com/apache/datafusion-comet/pull/948 show that Comet's Spark-compatible aggregates are ~50% slower than the DataFusion equivalents:

aggregate/avg_decimal_datafusion
                        time:   [653.56 µs 657.57 µs 662.06 µs]
aggregate/avg_decimal_comet
                        time:   [1.0581 ms 1.0592 ms 1.0604 ms]
aggregate/sum_decimal_datafusion
                        time:   [695.51 µs 696.48 µs 697.60 µs]
aggregate/sum_decimal_comet
                        time:   [1.0218 ms 1.0230 ms 1.0242 ms]

Describe the potential solution

No response

Additional context

No response

andygrove avatar Sep 18 '24 16:09 andygrove

Related upstream changes in arrow-rs: https://github.com/apache/arrow-rs/pull/6419

andygrove avatar Sep 20 '24 13:09 andygrove

It's necessary to check is_valid_decimal_precision in update_single? Could update_single only check i128 overflow or not, and is_valid_decimal_precision be checked at the final evaluate?

leung-ming avatar Aug 04 '25 15:08 leung-ming