ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Performance 18254756429: Improve hash grouping aggregation parallelism

Open alexowens90 opened this issue 1 month ago • 0 comments

Reference Issues/PRs

18254756429

What does this implement or fix?

Poor quality hash implementations of integral types, including at least some implementations of std::hash are basically a static cast. e.g. std::hash<int64_t>{}(100) == 100. This is fast, but leads to poor distributions in our bucketing, where we mod the hash with the number of buckets. In particular, if performing a grouping hash on a timeseries where the time points are dates results in all of the rows being partitioned into bucket zero, which then results in no parallelism in the aggregation clause.

Swap to using a consistent hash function across all supported platforms with improved uniformity.

alexowens90 avatar Oct 23 '25 15:10 alexowens90