Optimize DistinctCountHLL aggregation for high-cardinality dictionary-encoded columns

Open praveenc7 opened this issue 1 month ago • 4 comments

Problem

The DISTINCTCOUNTHLL aggregation function suffers from severe performance degradation when processing high-cardinality dictionary-encoded columns (14 Million). Profiling shows that 50% of CPU time is spent in RoaringBitmap operations:

For dictionary-encoded columns, the current implementation uses RoaringBitmap to track dictionary IDs during aggregation. While memory-efficient for low cardinality, this approach has O(n log n) insertion complexity that becomes prohibitively expensive for high-cardinality columns (>100K distinct values).

Queries on high-cardinality columns (1M - 15M) (e.g., user IDs, member) takes about 6 - 10sec. RoaringBitmap operations dominate query execution time. No performance benefit from using HLL over distinct count

Proposed Solution

Implement adaptive cardinality handling that dynamically switches from RoaringBitmap to HyperLogLog:

Low cardinality : Use RoaringBitmap (memory efficient, exact counts)
High cardinality : Convert to HyperLogLog (O(1) insertions)

Tested with a POC code where we choose HyperLogLog for High- cardinality column and observed improvements from 8sec -> 700ms

Dec 09 '25 05:12 praveenc7