delta-sharing
delta-sharing copied to clipboard
Add load_as methods for pyarrow dataset and table
Adds separate implementations for load_as_pyarrow_table and load_as_pyarrow_dataset that allows users to read delta sharing tables as pyarrow table and dataset respectively.
- [x] Add basic implementation
- [x] Fix lint
- [x] Refactor common code
- [x] Verify performance with and without limit
- [x] Add tests - converter
- [x] Add tests - reader
- [ ] Add tests - delta_sharing
- [x] Add examples
- [ ] Fix review comments
closes https://github.com/delta-io/delta-sharing/issues/238
@goodwillpunning @linzhou-db From the build logs I can see that the PYARROW_VERSION has been pinned to 4.x somewhere in the environment variables. This version of pyarrow came out in May, 2021 and since then there have been 6 major version releases.
Seems like there are some API inconsistencies the pinned version 4.x which is causing build failure on GitHub but locally test cases are passing. I also verified with versions 5.x to 10.x and was not able to reproduce the issue. Can you please unpin or upgrade this PYARROW_VERSION.
Thanks @chitralverma , will take a look once back in Jan. cc @zsxwing
Also what's your thought on loading cdf in pyarrow? is it something not needed for now?
Also what's your thought on loading cdf in pyarrow? is it something not needed for now?
I would prefer to raise a separate PR for the CDF to keep things simple and concise, this is just for the data.
@chitralverma @linzhou-db can we revive this PR?