lance
lance copied to clipboard
feat: build invert index distributely
Related to #3269.
I want to build an inverted index for Lance on a distributed system(ray/spark). Currently, I have modified the interface for creating an index to allow an array of fragment IDs to be passed in. If this array is passed in, the index creation interface will return an index object.
I also changed CreateIndex operation definition in python, make it similar to rust version. I don't know why it's different from rust version.
https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch
both query-then-fetch & dfs-query-then-fetch are supported. query-then-fetch is fast, and not accurate, if the number of texts is big, this mode is good enough, it's also default mode of es. dfs-query-then-fetch is slow, but accurate. it's very useful if data is skew or small.
next step I also want to do fts query distributely.
Codecov Report
Attention: Patch coverage is 57.98611% with 121 lines in your changes missing coverage. Please review.
Project coverage is 78.80%. Comparing base (
e8f4d98) to head (86815a0).
Additional details and impacted files
@@ Coverage Diff @@
## main #3452 +/- ##
==========================================
- Coverage 78.87% 78.80% -0.08%
==========================================
Files 256 256
Lines 96493 96667 +174
Branches 96493 96667 +174
==========================================
+ Hits 76110 76177 +67
- Misses 17315 17419 +104
- Partials 3068 3071 +3
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 78.80% <57.98%> (-0.08%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@BubbleCal could you please help review this PR?
@wjones127 Can you give a quick review for landing this PR?
ACTION NEEDED Lance follows the Conventional Commits specification for release automation.
The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.
For details on the error please inspect the "PR Title Check" action.
@chenkovsky , Can you rebase this PR?
Thank you for your contribution. This PR has been inactive for a while, so we're closing it to free up bandwidth. Feel free to reopen it if you still find it useful.