object-store-python
object-store-python copied to clipboard
Project Status
Hi there @roeap!
I really appreciate all the work you've done here. I was wondering a bit about the future of the project. Do you still plan on maintaining it and adding features or is it on-hold for now?
@norlandrhagen - fair point, as it has been a while since I have been active here - not for a lack of interest though, just live got in the way 😆.
SO I started tonight with merging all the open PRs and now want to do one round of review to see if we need some maintenance somewhere. And then do another release asap ..
going forward I am planning on doing maintenance, reviewing PRs in a timely manner, and implement the occasional feature if it makes sense. I am not going to be able to spend a large amount of time on this, but at least continuous.
As this essentially mirrors the object store APIs I am hoping that this will be sufficient to keep this attractive and foremost useful to users ... does that help?
Thank you @roeap! Totally understand and great to know your plans.
Do you mind please updating the package
@roeap do you think it'd be a good idea to utilize Object Store from Apache as the backend? We could utilize PyO3.
This would take away the overhead of maintaining the Rust code (and any optimizations/issues that come with it).
In https://github.com/kylebarron/arro3/pull/229 I made a pyo3 integration for object-store, which is designed for other Rust developers making Python packages who want to use object-store in their own crates. See https://github.com/roeap/object-store-python/issues/3 for discussion on the original goals of that. I'll publish this to crates.io whenever pyo3-async-runtimes has its 0.22 release.
I'll likely make a Python-facing wrapper as well sometime soon. It won't have the same API as object-store-python, but I'm hoping my implementation will be easier to maintain and keep up to date.
@roeap do you think it'd be a good idea to utilize Object Store from Apache as the backend? We could utilize PyO3.
@ByteBaker, isn't that what this project already does?
In https://github.com/kylebarron/arro3/pull/229 I made a pyo3 integration for object-store, which is designed for other Rust developers making Python packages who want to use object-store in their own crates. See https://github.com/roeap/object-store-python/issues/3 for discussion on the original goals of that. I'll publish this to crates.io whenever pyo3-async-runtimes https://github.com/PyO3/pyo3-async-runtimes/issues/1.
@kylebarron can you expand a bit more on the difference between your implementation and this project? I'm looking into writing Python bindings for SlateDB, which is also based on the object_store crate. I was hoping to take advantage of some prewritten PyO3 bindings for object_store.
Sure, @dsgibbons. And for clarity I moved the object-store integration into a separate repo here: https://github.com/developmentseed/object-store-rs
Overall differences
- Better maintained. This is subjective, and my project still has a bus factor of 1, but e.g. I've had PRs here sit for 8 months (https://github.com/roeap/object-store-python/pull/9)
- object-store-rs is not a fork of this repo; it's a reimplementation to try and have simpler end-user APIs and easier internal maintenance.
- IMO this repo is a bit overly complicated. The code across
lib.rsandbuilder.rstotals like 1200 lines, and does a lot of manual re-implementation of the coreobject-storemethods. The body of my Python function to create an S3 store is 13 lines of code. It's this simple because of smart use of theFromPyObjecttrait.PyAmazonS3ConfigKeyis a tiny wrapper aroundobject_store::aws::AmazonS3ConfigKey, which validates that the Python string input is indeed a valid key before it even reaches my function. Then my function can take in aHashMap<PyAmazonS3ConfigKey, String>and I don't need to do any validation in the body of my function, I can just pass it tobuilder.with_config. - Having these simple wrappers around upstream
object_storeconfig structs should hopefully mean less maintenance as well. Ifobject_storeadds a new key toobject_store::aws::AmazonS3ConfigKey, I don't need to change anything on my side to support the new version; the validation will automatically still work.
Python facing differences
You can see my WIP API docs here
- Fuller implementation, including stuff like multipart put (https://github.com/roeap/object-store-python/pull/14). So we can upload large files efficiently.
- Uses Python native types where possible. This library overloads stuff like
Pathwith custom Python classes. I want to handle whatever inputs the user already has, likestr, and by handling this on the rust side, any other Rust library that uses my integration will get it for free. - A streaming
getimplementation is WIP, based on #29. We should be able to provide an async or sync iterator to the user for streaming the bytes of a file or the items in aListResult. - Doesn't need a full Python-side wrapper in python code, for easier maintenance.
Rust-facing differences
I wanted a rust-facing library because I want to use this from other Rust libraries exported to Python, including arro3, geoarrow-rs, icechunk, etc.
In pyo3-arrow I figured out a nice way to have pyo3-integration for Arrow data, where each Rust library doesn't need to export anything new to Python. But this works because Arrow is ABI stable, while ObjectStore is not. So having a rust-facing pyo3 extension is slightly harder here because each Rust package will have to export its own Python classes that are built against your own library.
My crate uses the latest version of pyo3, v0.22. I can't publish this to crates.io yet because https://github.com/awestlake87/pyo3-asyncio is no longer maintained and the official fork https://github.com/PyO3/pyo3-async-runtimes hasn't published an 0.22 version yet (but is updated to 0.22 on git). I'm hoping that pyo3-async-runtimes will publish an 0.22 version very soon, and then I'll publish to crates.io.
All of these APIs under store are Python classes exported by pyo3-object_store, defined by register_store_module. And then all your own code has to do is accept PyObjectStore as a parameter, and then you can call into_inner to get an Arc<dyn ObjectStore>, and do whatever you want with it.
Thank you for that @kylebarron. This is very helpful.
I published my own version of an object_store wrapper, object-store-rs to PyPI: https://github.com/developmentseed/object-store-rs
Edit: renamed to obstore: https://github.com/developmentseed/obstore