milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Enhancement]: query optimization of inverted index under high selectivity

Open longjiquan opened this issue 8 months ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

What would you like to be added?

I want to optimize the query on inverted index when the selectivity is very high.

Why is this needed?

Now, the query on inverted index will get a result array which indicates the offsets matching the query, and then apply the result array to the target bitset. Taking term query (point query) as an example:

https://github.com/milvus-io/milvus/blob/41714142229e9a3d4e9e39f89868a53bf36a4e66/internal/core/src/index/InvertedIndexTantivy.cpp#L200-L216

If the selectivity is very high, the result array may be very large, so the cost can not be ignored.

If we can in-place modify the bitset when query on inverted index, the cost of data transmission between rust and c can be saved. One of these approach we can imagine is callback function, and its usage should be like below:

 template <typename T> 
 const TargetBitmap 
 InvertedIndexTantivy<T>::In(size_t n, const T* values) { 
     TargetBitmap bitset(Count()); 
     auto callback = [&bitset](uint32_t offset) -> void { bitset[offset] = true; };
     for (size_t i = 0; i < n; ++i) { 
         wrapper_->term_query_with_callback(values[i], callback);
     } 
     return bitset; 
 } 

To complete this, we have to do two things:

  • Investigate how to pass a function from c to rust;
  • Refactor the query on inverted index, do not return the offsets array, call the function instead;

Anything else?

No response

longjiquan avatar May 31 '24 03:05 longjiquan