ArcticDB Performance 18254756429: Improve hash grouping aggregation parallelism

Performance 18254756429: Improve hash grouping aggregation parallelism

Open alexowens90 opened this issue 1 month ago • 0 comments

Reference Issues/PRs

What does this implement or fix?

Poor quality hash implementations of integral types, including at least some implementations of std::hash are basically a static cast. e.g. std::hash<int64_t>{}(100) == 100. This is fast, but leads to poor distributions in our bucketing, where we mod the hash with the number of buckets. In particular, if performing a grouping hash on a timeseries where the time points are dates results in all of the rows being partitioned into bucket zero, which then results in no parallelism in the aggregation clause.

Swap to using a consistent hash function across all supported platforms with improved uniformity.

Oct 23 '25 15:10 alexowens90

ArcticDB ArcticDB copied to clipboard

Performance 18254756429: Improve hash grouping aggregation parallelism

Reference Issues/PRs

What does this implement or fix?

ArcticDB
ArcticDB copied to clipboard