luceneutil
luceneutil copied to clipboard
Add support for KNN ParentJoin Benchmarks
With Lucene supporting parent-block joins for KNN vectors, and ongoing work around Multi-Vector support, it would be good to have common, shared benchmarks that can test recall, latency etc. for multiple vector values that belong to the same group (either parent document, or multi-vector in the same document)
This issue is a parent ;) issue to collate the subtasks involved with adding ParentJoin benchmarks for KNN.
Task List:
- [x] Fetch Cohere wiki metadata needed to create parent-child relationships in documents
- [ ] Create index with parent-child relationships in documents
- [ ] Leverage
exactSearch
from Lucene to compute NearestNeighbors for recall baseline - [ ] Use
DiversifyingChildrenFloatKnnVectorQuery
on candidate to benchmark ParentJoin recall and latency