milvus
milvus copied to clipboard
[Enhancement]: query optimization of inverted index under high selectivity
Is there an existing issue for this?
- [X] I have searched the existing issues
What would you like to be added?
I want to optimize the query on inverted index when the selectivity is very high.
Why is this needed?
Now, the query on inverted index will get a result array which indicates the offsets matching the query, and then apply the result array to the target bitset. Taking term query (point query) as an example:
https://github.com/milvus-io/milvus/blob/41714142229e9a3d4e9e39f89868a53bf36a4e66/internal/core/src/index/InvertedIndexTantivy.cpp#L200-L216
If the selectivity is very high, the result array may be very large, so the cost can not be ignored.
If we can in-place modify the bitset when query on inverted index, the cost of data transmission between rust and c can be saved. One of these approach we can imagine is callback function, and its usage should be like below:
template <typename T>
const TargetBitmap
InvertedIndexTantivy<T>::In(size_t n, const T* values) {
TargetBitmap bitset(Count());
auto callback = [&bitset](uint32_t offset) -> void { bitset[offset] = true; };
for (size_t i = 0; i < n; ++i) {
wrapper_->term_query_with_callback(values[i], callback);
}
return bitset;
}
To complete this, we have to do two things:
- Investigate how to pass a function from c to rust;
- Refactor the query on inverted index, do not return the offsets array, call the function instead;
Anything else?
No response