High runtime/space complexity for construction with many unique `#[venndb(filter)]` fields

Open quantatic opened this issue 1 year ago • 1 comments

The space (and time for construction) complexity for a Db for a type that has many (but not all) unique #[venndb(filter)] fields appears to be O(n^2), which makes constructing (and holding) such a Db quickly infeasible for collection with more than ~100,000 unique fields.

Among other solutions, I wonder if we could use some sort of compressed bitset implementation, like roaring or hi_sparse_bitset to better handle this scenario with minimal to no performance loss in the "large number of total entries, small number of unique filter field values" case.

If we're not interested in adjusting the current implementation, this could also be something gated behind a feature flag or configured by some derive attribute.

I'd be happy to take a look at feasibility if it's something that would be considered :smiley:

Jul 19 '24 04:07 quantatic

Thank you for raising this. My filters are usually < 16, so I do not really have faced this issue before. I suppose however that such a thing is legit.

We are since v0.6.0 using our own bitset implementation. In future was also thinking to provide an atomic one for opt-in atomic support of some fields (that can be updated).

Happy to improve those implementations to make it that we do not allocate dumbly based on the max index. Sounds for sure like a useful improvement, and not only for filters but in general.

Are you willing to still pick this up? if so you can look perhaps at the current bitvec.rs file in this repo and we can figure out together in this issue how we would update it to make it "sparse" or something similar.

Aug 23 '25 12:08 GlenDC