Include FAPEC compressor support to Parquet?
Describe the enhancement requested
FAPEC is a high-performance data compression algorithm with many options, based on efficient entropy coding and including several pre-processing algorithms for time series, images, text, floats, etc. It's already available for some formats like HDF5 and FITS. We're now investigating the possibility to include FAPEC as a new codec option for Parquet, which (we think) should provide better compression ratios (than the currently available codecs) mainly for integers and floats/doubles. We're now working on a proof-of-concept to evaluate how much would it actually improve. If the outcome reveals a "significant enough" improvement, will it be possible to include this new option? Note that FAPEC is a commercial product, so a valid license would be needed to generate Parquet files with this, and the compression library would have to be used in binary form for the adequate platform. (reading Parquet files compressed with FAPEC would always be free). Perhaps this would be a blocking issue...?
Component(s)
No response
Thanks for creating an issue wrt to a new compression codec. There was a similar discussion happened on [email protected]: https://lists.apache.org/thread/ht95wm8trfx2z4pq91t7170t2qjqg4yw. I think the replies have provided some general concerns of adding a new codec.