quickwit
quickwit copied to clipboard
Add support for the cardinality aggregation.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html I am not sure what the right approach should be here.
Do we want an approximate sketch for cardinality that supports union, or do we just want to carry sets of hashes?
HyperLogLog++ is the common used algorithm here, which also allows merges.
Numerical values should be straightforward, we can just use the u64 representation.
Terms are tricky, since the term ordinals are not comparable and fetching the terms may be way too slow. We probably would need to delay fetching the terms values to the end of the aggregation. We could just use a term aggregation and at the end fetch all the values and put them into HyperLogLog++.
Yes. Maybe we can even use the delta encoding to compute the hashes a little bit faster.
Hi folks. I'm just throwing my support onto this ticket, as this is the only thing blocking us from moving from Elastic to Quickwit. Essentially all of our dashboards group by a term, be that a userId or errorCodes etc.
I'm just checking to see if this on a roadmap?
It seems like the PR adding support for the cardinality aggregation in tantivy is in good shape, so we should be able to ship that feature in the next release scheduled for July.
@PSeitz, do you think you can help push that tantivy PR across the finish line and perform the integration work in Quickwit (mostly documentation, I assume) in the next few weeks?
Yes, that should be possible
https://github.com/quickwit-oss/quickwit/pull/5204