raft
raft copied to clipboard
[FEA] Batched k-selection
This issue is to track discussion and implementation of an efficient mechanism for supporting batched k-selection. The current discussion has been around allowing "restart" in batches (potentially by storing off the max distance found thus far and implementing an ANN pre-filter to exclude all neighbors below that distance). Ultimately, we would like to support batching the computation of the k's so a user could say "give me the closest 1-5 neighbors, then the closest 6-10 neighbors, then the closest 11-15 neighbors, etc...) and not have to load the entire set in memory at the same time.
I talked to @spartee. He said they call this query aggregation and they have recently added it on the CPU side. @spartee can add more detail.
On the raft's side we often use the word batch for the groups of queries processed by a single select_k. How about changing the word batched here with paged to depict that we page through a single list of k_big neighbors, k_small neighbors at a time?