iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Prototype HLL buffers in manifest files to provide column distinct estimates.

Open rdblue opened this issue 7 years ago • 2 comments

Distinct counts aren't very valuable to cost-based optimization because they can't be easily merged. They should be removed. As a replacement, look into storing HLL buffers if they aren't too large.

rdblue avatar Jan 31 '18 23:01 rdblue

Removed distinct counts in 75088f6875fc8d3cc4c3af38899742de1b919abf.

rdblue avatar Feb 16 '18 18:02 rdblue

The Presto team has some code for HLL.

Format description - https://github.com/airlift/airlift/blob/master/stats/docs/hll.md Code - https://github.com/airlift/airlift/tree/master/stats/src/main/java/io/airlift/stats/cardinality

I need to play with it, but the summaries can be pretty large.

omalley avatar Mar 07 '18 16:03 omalley