lance icon indicating copy to clipboard operation
lance copied to clipboard

feat: build invert index distributely

Open chenkovsky opened this issue 9 months ago • 2 comments

Related to #3269.

I want to build an inverted index for Lance on a distributed system(ray/spark). Currently, I have modified the interface for creating an index to allow an array of fragment IDs to be passed in. If this array is passed in, the index creation interface will return an index object.

I also changed CreateIndex operation definition in python, make it similar to rust version. I don't know why it's different from rust version.

https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch

both query-then-fetch & dfs-query-then-fetch are supported. query-then-fetch is fast, and not accurate, if the number of texts is big, this mode is good enough, it's also default mode of es. dfs-query-then-fetch is slow, but accurate. it's very useful if data is skew or small.

next step I also want to do fts query distributely.

chenkovsky avatar Feb 14 '25 14:02 chenkovsky

Codecov Report

Attention: Patch coverage is 57.98611% with 121 lines in your changes missing coverage. Please review.

Project coverage is 78.80%. Comparing base (e8f4d98) to head (86815a0).

Files with missing lines Patch % Lines
rust/lance/src/io/exec/fts.rs 42.52% 41 Missing and 9 partials :warning:
rust/lance-index/src/scalar/inverted/index.rs 79.82% 20 Missing and 3 partials :warning:
rust/lance/src/index/scalar.rs 5.88% 15 Missing and 1 partial :warning:
rust/lance/src/dataset/scanner.rs 31.81% 15 Missing :warning:
rust/lance-index/src/scalar.rs 10.00% 9 Missing :warning:
rust/lance/src/index.rs 83.33% 1 Missing and 3 partials :warning:
rust/lance-index/src/scalar/inverted/wand.rs 76.92% 3 Missing :warning:
rust/lance/src/dataset.rs 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3452      +/-   ##
==========================================
- Coverage   78.87%   78.80%   -0.08%     
==========================================
  Files         256      256              
  Lines       96493    96667     +174     
  Branches    96493    96667     +174     
==========================================
+ Hits        76110    76177      +67     
- Misses      17315    17419     +104     
- Partials     3068     3071       +3     
Flag Coverage Δ
unittests 78.80% <57.98%> (-0.08%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar Feb 14 '25 15:02 codecov-commenter

@BubbleCal could you please help review this PR?

chenkovsky avatar Feb 22 '25 09:02 chenkovsky

@wjones127 Can you give a quick review for landing this PR?

yanghua avatar Mar 14 '25 07:03 yanghua

ACTION NEEDED Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

github-actions[bot] avatar Mar 14 '25 11:03 github-actions[bot]

@chenkovsky , Can you rebase this PR?

yanghua avatar May 23 '25 09:05 yanghua

Thank you for your contribution. This PR has been inactive for a while, so we're closing it to free up bandwidth. Feel free to reopen it if you still find it useful.

github-actions[bot] avatar Nov 16 '25 02:11 github-actions[bot]