polars
polars copied to clipboard
feat(rust): generalize the cloud storage builders
This PR generalizes the parquet support for cloud urls. It enables all the backends supported by object_store. Note that there is still one open issue with the Azure builder, I will re-enable it once the upstream PR is fixed.
This PR threads through the code the required cloud options, as opposed to using the process environment. This reduces the magic provided by global state (std::env) at the expense of more changes for the different layers. I personally prefer more explicit options/settings passing.
This PR also breaks the existing object_store.rs into a module, that file was getting bigger and had a poor cohesion.
Feedback appreciated.
I got some help from the developers of object-store crate, they provided an API that is nicer for us to use. Many thanks :)
This code is ready to review at your convenience. I am not sure how to fully enable it in the Python side... @ritchie46
@ritchie46 addressed feedback, tests are passing, ready for more feedback or merge 🤗
@winding-lines when object_store adds more backends in the future, will it require changes in polars as well or is it generalized now?
because I see that in their main/ master branch HTTP has also been added.
@chitralverma in the current iteration the deltalake and object_store teams have refactored some code out of the former and put in the latter. I think we can also contribute our current layer upstream so that any future changes will be integrated just by recompiling.
Given that this PR has been open for a while my preference would be to merge it and then release a polars version so that I can do more testing at work.
Let me know what you think :)
@ritchie46 addressed feedback, tests are passing :-)
Many thanks!
thanks @ritchie46 ❤️