data-prep-kit
data-prep-kit copied to clipboard
[Feature] pyarrow parquet write_table can save up to 30% storage with compression flag ‘ZSTD’
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Library/core
Feature
https://github.com/IBM/data-prep-kit/blob/dev/data-processing-lib/python/src/data_processing/utils/transform_utils.py#L151
# convert table to bytes
writer = pa.BufferOutputStream()
pq.write_table(table=table, where=writer, compression='ZSTD')
return bytes(writer.getvalue())
ZSTD can save up to 30% storage space compared to Snappy.
Submitted as proposed by R. Jain and M. L. Hershcovitch.
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
PR404 opened for review
I believe this is fixed in #404 and #441
This is available after 0.2.0 in TransformUtils.convert_arrow_to_binary()