parquet-java
parquet-java copied to clipboard
Apache Parquet
This will improve the write performance by 1/3 - 1/4 based on my testing. It makes a huge difference when dealing with really large files (several GBs). In the original...
Allow for the option to not just eat errors when writing the metadata files. The code isn't as clean as it could be but I was optimizing for making sure...
Support hadoop zero copy read api in SeekableInputStream
A fairly straightforward tool, but quite useful!
parquet-benchmarks only contain read and write benchmarks with a single thread. I add concurrent Parquet file scans like typical data-parallel computing.
Parquet Encryption supports columnar level access control. We can encrypt data at column level, It could selectively encrypt only a subset of columns. If you want a column to encrypt...
Fixes a very small corner case in parquet-avro if reflection of abstract generic fields is used.
The parquet-cascading module currently includes code that lets Cascading write and read Thrift-based data to Parquet files. Protobufs is another popular alternative to Thrift which can be explicitly supported by...