parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

Validate parquet row group size and HDFS block size

Open asfimport opened this issue 10 years ago • 1 comments

The OutputFormat should verify that parquet.block.size < dfs.blocksize to avoid bad performance. In addition, we could check that (dfs.blocksize % parquet.block.size) < 1MB to ensure that some number of row groups is approximately the size of an HDFS block.

Reporter: Ryan Blue / @rdblue

Related issues:

Note: This issue was originally created as PARQUET-166. Please see the migration documentation for further details.

asfimport avatar Jan 09 '15 22:01 asfimport

Ryan Blue / @rdblue: The first part of this, ensuring that the row group size is less than the block size, was added in PARQUET-306 along with row group padding. We should determine whether the second part is worth doing. People will ignore warnings and would not appreciate errors.

asfimport avatar Jul 20 '15 23:07 asfimport