lance icon indicating copy to clipboard operation
lance copied to clipboard

feat: object store registry for custom object store providers

Open maxburke opened this issue 1 year ago • 4 comments

This change lays the groundwork for LanceDB to support custom implementations of object stores, selected by a custom URL scheme.

Ref: https://discord.com/channels/1030247538198061086/1197630564254101618/1208915798425477181

maxburke avatar Jun 22 '24 18:06 maxburke

ACTION NEEDED Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

github-actions[bot] avatar Jun 22 '24 18:06 github-actions[bot]

Thanks for making a PR. I'm in favor of making the registry extensible. I think a good property of an extensible registry is that the default values are able to be inserted using the extension point. That ensures that third-parties have all the APIs they need to properly extend Lance. I don't think you have to do that refactor immediately in this PR. Instead, I've made some suggestions about some tweaks we would likely need to make to be able to use the registry for our default stores.

Thanks for your comments, @wjones127 !

One thought I had was that the new_store trait member should also be accompanied by a new_commit_handler trait member, which would help make the changes to lance-table neater. However, the CommitHandler trait is defined in lance-table, and the ObjectStoreRegistry is defined in lance-io, so something would have to move (probably CommitHandler to lance-io). Is this an OK move to make as part of this change?

maxburke avatar Jun 25 '24 23:06 maxburke

Codecov Report

Attention: Patch coverage is 56.04396% with 40 lines in your changes missing coverage. Please review.

Project coverage is 79.41%. Comparing base (9535186) to head (8264b10).

Files Patch % Lines
rust/lance-io/src/object_store.rs 65.62% 10 Missing and 1 partial :warning:
java/core/lance-jni/src/blocking_dataset.rs 0.00% 9 Missing :warning:
rust/lance/src/dataset.rs 55.00% 8 Missing and 1 partial :warning:
rust/lance/src/dataset/builder.rs 53.33% 7 Missing :warning:
rust/lance/src/dataset/write.rs 33.33% 2 Missing :warning:
rust/lance-table/src/io/commit.rs 0.00% 1 Missing :warning:
rust/lance/src/dataset/cleanup.rs 66.66% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2513      +/-   ##
==========================================
- Coverage   79.44%   79.41%   -0.04%     
==========================================
  Files         221      221              
  Lines       64683    64744      +61     
  Branches    64683    64744      +61     
==========================================
+ Hits        51388    51415      +27     
- Misses      10335    10366      +31     
- Partials     2960     2963       +3     
Flag Coverage Δ
unittests 79.41% <56.04%> (-0.04%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jun 28 '24 19:06 codecov-commenter

However, the CommitHandler trait is defined in lance-table, and the ObjectStoreRegistry is defined in lance-io, so something would have to move (probably CommitHandler to lance-io).

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table.

There is a commit_handler_from_url function that maybe should be put on the registry.

wjones127 avatar Jul 02 '24 20:07 wjones127

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table.

There is a commit_handler_from_url function that maybe should be put on the registry.

Would you like me to do this as part of this MR or would you like me to follow up in another MR?

edit: personally I'd be more comfortable doing it in a follow-up MR because I think it would have a big knock-on how the default object stores are constructed; I think it would need moving ObjectStore::from_uri_and_params / ObjectStore::new_from_url out of lance_io and into lance

maxburke avatar Jul 24 '24 20:07 maxburke

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table. There is a commit_handler_from_url function that maybe should be put on the registry.

Would you like me to do this as part of this MR or would you like me to follow up in another MR?

edit: personally I'd be more comfortable doing it in a follow-up MR because I think it would have a big knock-on how the default object stores are constructed; I think it would need moving ObjectStore::from_uri_and_params / ObjectStore::new_from_url out of lance_io and into lance

You can do that in a follow up PR.

wjones127 avatar Jul 25 '24 20:07 wjones127

Given CommitHandler is a table-level concept, I don't think it should move into the low-level IO library. I think then I would prefer ObjectStoreRegistry be pulled out into lance or lance-table. There is a commit_handler_from_url function that maybe should be put on the registry.

Would you like me to do this as part of this MR or would you like me to follow up in another MR? edit: personally I'd be more comfortable doing it in a follow-up MR because I think it would have a big knock-on how the default object stores are constructed; I think it would need moving ObjectStore::from_uri_and_params / ObjectStore::new_from_url out of lance_io and into lance

You can do that in a follow up PR.

Awesome. Is there anything else that needs to be taken care of in this MR?

maxburke avatar Jul 25 '24 21:07 maxburke