parquet-rs
parquet-rs copied to clipboard
Add Arrow Support
This is the umbrella ticket to track adding Apache Arrow support. Tasks:
- [ ] Add Arrow schema converter for read path (#185).
- [ ] Add Arrow schema converter for write path.
- [ ] Support reading & writing Arrow in encoder/decoders (#191).
- [ ] Support record reader for Arrow.
- [ ] Support record writer for Arrow.
- [ ] Update documentation for the new feature & how to use.
I think the next tasks will be:
- Add reader that reads parquet into arrow.
- Complete the converter to convert arrow schema to parquet schema.
- Add writer to save arrow data to parquet format.
Thanks @liurenjie1024 . Updated the description for some potential tasks.
I suggest adding an item to update the existing doc to reflect the addition of arrow reader/writer.
DataFusion has code for loading parquet into arrow ... might be worth looking at
On Thu, Nov 8, 2018 at 4:47 AM Ivan [email protected] wrote:
I suggest adding an item to update the existing doc to reflect the addition of arrow reader/writer.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sunchao/parquet-rs/issues/186#issuecomment-436982663, or mute the thread https://github.com/notifications/unsubscribe-auth/AA5AxEntUdq27cqNRXZ8yJV8FTxsZMIXks5utCf3gaJpZM4YP3X3 .
@sadikovi Thanks - added. @andygrove cool - will take a look.
@andygrove Yes, I'll take that as a reference. Also I'll also reference the cpp implementation of arrow adapter of parquet.
I am very interested in this. I am wondering if we can add a generic reader trait to the main arrow project and then have an implementation in parquet-rs.
I have a CSV reader for arrow that could be published as a separate crate and implement the same trait.
Actually, maybe this is as simple as implementing Iterator<Arc<RecordBatch>>