quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Add support for the cardinality aggregation.

Open fulmicoton opened this issue 2 years ago • 5 comments

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html I am not sure what the right approach should be here.

Do we want an approximate sketch for cardinality that supports union, or do we just want to carry sets of hashes?

fulmicoton avatar Aug 30 '23 05:08 fulmicoton

HyperLogLog++ is the common used algorithm here, which also allows merges.

Numerical values should be straightforward, we can just use the u64 representation.

Terms are tricky, since the term ordinals are not comparable and fetching the terms may be way too slow. We probably would need to delay fetching the terms values to the end of the aggregation. We could just use a term aggregation and at the end fetch all the values and put them into HyperLogLog++.

PSeitz avatar Aug 30 '23 12:08 PSeitz

Yes. Maybe we can even use the delta encoding to compute the hashes a little bit faster.

fulmicoton avatar Aug 30 '23 22:08 fulmicoton

Hi folks. I'm just throwing my support onto this ticket, as this is the only thing blocking us from moving from Elastic to Quickwit. Essentially all of our dashboards group by a term, be that a userId or errorCodes etc.

I'm just checking to see if this on a roadmap?

mcmarkj avatar Jun 25 '24 15:06 mcmarkj

It seems like the PR adding support for the cardinality aggregation in tantivy is in good shape, so we should be able to ship that feature in the next release scheduled for July.

@PSeitz, do you think you can help push that tantivy PR across the finish line and perform the integration work in Quickwit (mostly documentation, I assume) in the next few weeks?

guilload avatar Jun 25 '24 15:06 guilload

Yes, that should be possible

PSeitz avatar Jun 30 '24 23:06 PSeitz

https://github.com/quickwit-oss/quickwit/pull/5204

PSeitz avatar Sep 09 '24 01:09 PSeitz