luceneutil icon indicating copy to clipboard operation
luceneutil copied to clipboard

Add support for KNN ParentJoin Benchmarks

Open vigyasharma opened this issue 7 months ago • 1 comments

With Lucene supporting parent-block joins for KNN vectors, and ongoing work around Multi-Vector support, it would be good to have common, shared benchmarks that can test recall, latency etc. for multiple vector values that belong to the same group (either parent document, or multi-vector in the same document)

This issue is a parent ;) issue to collate the subtasks involved with adding ParentJoin benchmarks for KNN.

Task List:

  • [x] Fetch Cohere wiki metadata needed to create parent-child relationships in documents
  • [ ] Create index with parent-child relationships in documents
  • [ ] Leverage exactSearch from Lucene to compute NearestNeighbors for recall baseline
  • [ ] Use DiversifyingChildrenFloatKnnVectorQuery on candidate to benchmark ParentJoin recall and latency

vigyasharma avatar Jul 23 '24 22:07 vigyasharma