mimir icon indicating copy to clipboard operation
mimir copied to clipboard

Switch to HyperLogLog for domain tests

Open okennedy opened this issue 5 years ago • 0 comments

The shape watcher lens currently runs a Count Distinct query during the training phase to discover categorical attributes. This is not great for large datasets. Fortunately, we don't care about the actual number of distinct values... just that they're below some threshold. HyperLogLog count would be a much more efficient way to achieve the same goal.

okennedy avatar Jan 23 '20 16:01 okennedy