column
column copied to clipboard
how do I efficiently query for unique values of a field
say I get a stream of data: {machineCode: "
Is there a way to efficiently get all the unique machine codes? or should I just keep track of them while inserting data?
No built-in feature in column for this, but there's 2 ways I can think of to solve this problem:
- if you're okay with imprecise measurement, use HyperLogLog to store machine codes
- otherwise, a standard map/set is required
You can do both during insertion or a range query that iterates over all elements.
thanks, I went with the second method, but that leaves me with having to do the range query when restoring state from a snapshot :-(