lance
lance copied to clipboard
perf: implement XTR for retrieving multivector
this PR introduces XTR, which can score the documents without the original multivector, so we don't need any IO op for searching on multivector.
it sets the minimum similarity as the estimated similarity for missed documents of single query vector.
Codecov Report
Attention: Patch coverage is 85.31746% with 37 lines in your changes missing coverage. Please review.
Project coverage is 78.48%. Comparing base (
33ae43b) to head (8d5a835).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| rust/lance/src/io/exec/knn.rs | 85.44% | 24 Missing and 7 partials :warning: |
| rust/lance/src/dataset/scanner.rs | 79.31% | 1 Missing and 5 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #3437 +/- ##
==========================================
- Coverage 78.48% 78.48% -0.01%
==========================================
Files 252 252
Lines 94011 94220 +209
Branches 94011 94220 +209
==========================================
+ Hits 73783 73947 +164
- Misses 17232 17279 +47
+ Partials 2996 2994 -2
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 78.48% <85.31%> (-0.01%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Something seems off in the algorithm, with how
missed_similaritiesis handled. Could you address my comment, and also maybe write a unit tests that shows we get correct results? out of this?
we have tests here https://github.com/lancedb/lance/pull/3437/files#diff-6de816b72e7c722316243c57df4f809ad34dc8581367c72335154dada48c40edL993
Something seems off in the algorithm, with how
missed_similaritiesis handled. Could you address my comment, and also maybe write a unit tests that shows we get correct results? out of this?we have tests here https://github.com/lancedb/lance/pull/3437/files#diff-6de816b72e7c722316243c57df4f809ad34dc8581367c72335154dada48c40edL993
I meant more text the XTR algorithm itself was working as expected. Part of why I'm having a hard time understand this PR is there are no tests showing the expected behavior of the algorithm.