starrocks
starrocks copied to clipboard
HLL function with higher precision
- [x] https://github.com/StarRocks/starrocks/pull/20836
Feature request
Is your feature request related to a problem? Please describe.
The HyperLogLog (HLL) algorithm is a probabilistic algorithm used to estimate the cardinality (number of distinct elements) of a set with a high degree of accuracy, while using minimal memory.
StarRocks also has structure and functions related to HLL. But the precision of HLL algorithm is limited to 2^14, because the bias of current HLL algorithm is hard coded in hll.cpp.:
double bias = 5.9119 * 1.0e-18 * (estimate * estimate * estimate * estimate) -
1.4253 * 1.0e-12 * (estimate * estimate * estimate) + 1.2940 * 1.0e-7 * (estimate * estimate) -
5.2921 * 1.0e-3 * estimate + 83.3216;
So the current algorithm only works well in precision 2^14.
Describe the solution you'd like
Keep the current HLL structure and functions, add one function uniq_combined() to support higher precision, and the precision can be changed with parameter.
Describe alternatives you've considered
Introduce HLL function uniq_combined() which powered by Apache Datasketches. This project is dedicated to providing a broad selection of sketch algorithms of production quality which can produce results orders-of magnitude faster and with mathematically proven error bounds.
Maybe more functions can be developed based on this toolkit, which allows simplification of system's architecture and reduction in overall compute resources required for these heretofore difficult computational tasks.
Will split PR #20832 into 2 PRs:
- Introduce datasketches toolkit;
- New HLL function based on datasketches.
@kangkaisen Needn't the PR #20836 be merged before closing this issue?
We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!