[FEA] Primitives for pre-filtering and post-filtering
The IVF methods currently accept pre-filtering functions that can be applied during scan but still need optimized primitives that can allow users to efficiently express and perform the filtering logic. Post-filtering can also be done after the fact and could likely benefit from the same or similar set of optimized primitives.
So far we've discussed simple bitsets, roaring bitmaps, bloom filters, and hash table primitives to build filtering functions on top of. Initially, these APIs could probably accept an array of ids on device or host and produce a data structure from which a filter function can be produced which can be passed directly into search_with_filter. I think it will be important to consider the API design up front so we can provide a unified API experience as much as possible.
Another design detail to consider is that these primitives should be able to produce a filtering function that can work with both pre-filtering and post-filtering.
Taks list so far:
- [x] biset
- [ ] bitmap
- [ ] bloom filter
- [ ] hash table
- [ ] hashmap
Linking discussion about template parameters for filters: https://github.com/rapidsai/raft/pull/2212#issuecomment-1979439358, and related task
- [ ] avoid recompiling
ivf:pq::searchkernels when index type changes