object-store-python icon indicating copy to clipboard operation
object-store-python copied to clipboard

Project Status

Open norlandrhagen opened this issue 1 year ago • 9 comments
trafficstars

Hi there @roeap!

I really appreciate all the work you've done here. I was wondering a bit about the future of the project. Do you still plan on maintaining it and adding features or is it on-hold for now?

norlandrhagen avatar Aug 05 '24 18:08 norlandrhagen

@norlandrhagen - fair point, as it has been a while since I have been active here - not for a lack of interest though, just live got in the way 😆.

SO I started tonight with merging all the open PRs and now want to do one round of review to see if we need some maintenance somewhere. And then do another release asap ..

going forward I am planning on doing maintenance, reviewing PRs in a timely manner, and implement the occasional feature if it makes sense. I am not going to be able to spend a large amount of time on this, but at least continuous.

As this essentially mirrors the object store APIs I am hoping that this will be sufficient to keep this attractive and foremost useful to users ... does that help?

roeap avatar Aug 05 '24 20:08 roeap

Thank you @roeap! Totally understand and great to know your plans.

norlandrhagen avatar Aug 05 '24 23:08 norlandrhagen

Do you mind please updating the package

djouallah avatar Aug 17 '24 10:08 djouallah

@roeap do you think it'd be a good idea to utilize Object Store from Apache as the backend? We could utilize PyO3.

This would take away the overhead of maintaining the Rust code (and any optimizations/issues that come with it).

ByteBaker avatar Sep 21 '24 14:09 ByteBaker

In https://github.com/kylebarron/arro3/pull/229 I made a pyo3 integration for object-store, which is designed for other Rust developers making Python packages who want to use object-store in their own crates. See https://github.com/roeap/object-store-python/issues/3 for discussion on the original goals of that. I'll publish this to crates.io whenever pyo3-async-runtimes has its 0.22 release.

I'll likely make a Python-facing wrapper as well sometime soon. It won't have the same API as object-store-python, but I'm hoping my implementation will be easier to maintain and keep up to date.

kylebarron avatar Oct 15 '24 03:10 kylebarron

@roeap do you think it'd be a good idea to utilize Object Store from Apache as the backend? We could utilize PyO3.

@ByteBaker, isn't that what this project already does?

In https://github.com/kylebarron/arro3/pull/229 I made a pyo3 integration for object-store, which is designed for other Rust developers making Python packages who want to use object-store in their own crates. See https://github.com/roeap/object-store-python/issues/3 for discussion on the original goals of that. I'll publish this to crates.io whenever pyo3-async-runtimes https://github.com/PyO3/pyo3-async-runtimes/issues/1.

@kylebarron can you expand a bit more on the difference between your implementation and this project? I'm looking into writing Python bindings for SlateDB, which is also based on the object_store crate. I was hoping to take advantage of some prewritten PyO3 bindings for object_store.

dsgibbons avatar Oct 18 '24 11:10 dsgibbons

Sure, @dsgibbons. And for clarity I moved the object-store integration into a separate repo here: https://github.com/developmentseed/object-store-rs

Overall differences

  • Better maintained. This is subjective, and my project still has a bus factor of 1, but e.g. I've had PRs here sit for 8 months (https://github.com/roeap/object-store-python/pull/9)
  • object-store-rs is not a fork of this repo; it's a reimplementation to try and have simpler end-user APIs and easier internal maintenance.
  • IMO this repo is a bit overly complicated. The code across lib.rs and builder.rs totals like 1200 lines, and does a lot of manual re-implementation of the core object-store methods. The body of my Python function to create an S3 store is 13 lines of code. It's this simple because of smart use of the FromPyObject trait. PyAmazonS3ConfigKey is a tiny wrapper around object_store::aws::AmazonS3ConfigKey, which validates that the Python string input is indeed a valid key before it even reaches my function. Then my function can take in a HashMap<PyAmazonS3ConfigKey, String> and I don't need to do any validation in the body of my function, I can just pass it to builder.with_config.
  • Having these simple wrappers around upstream object_store config structs should hopefully mean less maintenance as well. If object_store adds a new key to object_store::aws::AmazonS3ConfigKey, I don't need to change anything on my side to support the new version; the validation will automatically still work.

Python facing differences

You can see my WIP API docs here

  • Fuller implementation, including stuff like multipart put (https://github.com/roeap/object-store-python/pull/14). So we can upload large files efficiently.
  • Uses Python native types where possible. This library overloads stuff like Path with custom Python classes. I want to handle whatever inputs the user already has, like str, and by handling this on the rust side, any other Rust library that uses my integration will get it for free.
  • A streaming get implementation is WIP, based on #29. We should be able to provide an async or sync iterator to the user for streaming the bytes of a file or the items in a ListResult.
  • Doesn't need a full Python-side wrapper in python code, for easier maintenance.

Rust-facing differences

I wanted a rust-facing library because I want to use this from other Rust libraries exported to Python, including arro3, geoarrow-rs, icechunk, etc.

In pyo3-arrow I figured out a nice way to have pyo3-integration for Arrow data, where each Rust library doesn't need to export anything new to Python. But this works because Arrow is ABI stable, while ObjectStore is not. So having a rust-facing pyo3 extension is slightly harder here because each Rust package will have to export its own Python classes that are built against your own library.

My crate uses the latest version of pyo3, v0.22. I can't publish this to crates.io yet because https://github.com/awestlake87/pyo3-asyncio is no longer maintained and the official fork https://github.com/PyO3/pyo3-async-runtimes hasn't published an 0.22 version yet (but is updated to 0.22 on git). I'm hoping that pyo3-async-runtimes will publish an 0.22 version very soon, and then I'll publish to crates.io.

All of these APIs under store are Python classes exported by pyo3-object_store, defined by register_store_module. And then all your own code has to do is accept PyObjectStore as a parameter, and then you can call into_inner to get an Arc<dyn ObjectStore>, and do whatever you want with it.

kylebarron avatar Oct 18 '24 15:10 kylebarron

Thank you for that @kylebarron. This is very helpful.

dsgibbons avatar Oct 18 '24 20:10 dsgibbons

I published my own version of an object_store wrapper, object-store-rs to PyPI: https://github.com/developmentseed/object-store-rs

Edit: renamed to obstore: https://github.com/developmentseed/obstore

kylebarron avatar Oct 21 '24 18:10 kylebarron