parquet-format
parquet-format copied to clipboard
PARQUET-2414: Extend BYTE_STREAM_SPLIT to support INT32, INT64 and FIXED_LEN_BYTE_ARRAY data
+1 I think this is great. Are PoCs needed for this? I'm interested in seeing how well this works as a DELTA_BINARY_PACKED replacement for my data.
@etseidl I've written the implementation for Parquet C++ here: https://github.com/apache/arrow/pull/40094
I was planning to implement it for Parquet Java, but you may want to do it as well.
I was planning to implement it for Parquet Java, but you may want to do it as well.
Sounds good. I'll put it in my queue. I'll check out your arrow implementation to see if there are any pitfalls to avoid. Thanks!
Thank you @pitrou for investigating this! Extending BYTE_STREAM_SPLIT to more data types will give us great new options in RAPIDS.