k-NN icon indicating copy to clipboard operation
k-NN copied to clipboard

Use IndexInput#prefetch in Exact search

Open shatejas opened this issue 10 months ago • 3 comments
trafficstars

Description

Exact search evaluates vectors in linear fashion. Leveraging IndexInput#prefetch to load the next vector in memory, can possibly help with reducing the read cost during runtime reducing the latencies. Prefetch gives a madvise WILL_NEED system call to the kernel, kernel may use this signal to prefetch a set of bytes async.

We need to benchmark and see if this yields improvements.

Pre-requisites

  • Lucene 10.x: prefetch API is only available with Lucene 10.x
  • Lucene changes to have prefetch supported in FloatVectorValues: Currently it is not supported and requires a lucene contribution

This can help speed up filtering queries, rescoring and exact search scripting

shatejas avatar Jan 23 '25 01:01 shatejas

A similar mechanism is being addressed here with searchable snapshot in core where based on file type we can perform the read ahead of the blocks. So for exact search if we are using flat vector files then access to that file can be implicitly powered using read ahead functionality to help in sequential access cases. This can tie up well with prefetch interface later where accessor can provide specific indication on when to perform read ahead vs when not to (random access).

sohami avatar Jan 23 '25 20:01 sohami

@sohami Thanks for the reference. I would be interested in the low level RFC/ implementation, currently there are only specific cases where we want prefetch since it affects search latencies for lucene engine (and with partial loading it might affect faiss engine as well). Its easy to add a prefetch API in float vector values which can use IndexInput#prefetch and then call prefetch based on how many vectors you need instead of a predefined block of data.

shatejas avatar Jan 24 '25 22:01 shatejas

Just to clarify, the read ahead mechanism is specific to remote store directory such that it can pre-download the blocks for sequential access. It is not tied to prefetching data in the operating system file cache.

currently there are only specific cases where we want prefetch since it affects search latencies for lucene engine

Can you share more light on this ?

sohami avatar Jan 31 '25 17:01 sohami