vortex
vortex copied to clipboard
Add cardinality estimate stat
Useful for compressor to decide if Dict compression is worthwhile.
There's a Rust crate already implementing it: https://docs.rs/hyperloglogplus/latest/hyperloglogplus/struct.HyperLogLogPlus.html
Can be used:
- At compress time: determine if Dict is worth trying or just fallback directly to FSST
- At query time: Datafusion allows reporting cardinality estimates, which are used for join selection: https://github.com/apache/datafusion/blob/8ba6732af5f4f32cbe0a23ef6bc2f393c640898b/datafusion/physical-plan/src/joins/utils.rs#L905