fix!: DH-20784: Mathematical agg ops should ignore NULLs, and poison with NaN

Open lbooker42 opened this issue 1 month ago • 1 comments

This PR implements proper NaN handling for mathematical aggregation operations on float and double types. The key change is that NULL values are ignored during aggregation, while NaN values "poison" the result (i.e., any NaN in the input causes NaN in the output). This aligns mathematical operations with standard IEEE 754 floating-point semantics.

Key changes:

Added countNaN parameter to CompactKernel interface to control NaN counting separately from NULL handling
Updated min/max operators to return NaN immediately when encountered, and prevent updates once NaN is reached
Enhanced percentile calculations to detect and propagate NaN values
Added absSum methods to Numeric API with proper NaN handling
Updated test infrastructure with better NaN test coverage

Verifed the following operations are validated with Numeric vector computations, adding tests where needed:

sumBy() / AggSpec.sum()
absSumBy() AggSpec.absSum()
minBy() / AggSpec.min()
maxBy() / AggSpec.max()
medianBy() / AggSpec.median()
AggSpec.percentile()
avgBy() / AggSpec.avg()
stdBy() / AggSpec.std()
varBy() / AggSpec.var()
wavgBy() / Aggregation.wavg()
wsumBy() / Aggregation.wsum()

Nov 21 '25 17:11 lbooker42

No docs changes detected for 1037af063ecd3ed6d70b7fa2a4f3bb42ccc24b27

Nov 21 '25 17:11 github-actions[bot]