secor
secor copied to clipboard
Secor Avro Parquet doesn't use any configuration for parquet writer
Hi Question about this part of the code
` public AvroParquetFileWriter(LogFilePath logFilePath, CompressionCodec codec) throws IOException { ...
// Not setting blockSize, pageSize, enableDictionary, and validating
writer = AvroParquetWriter.builder(path)
.withSchema(schemaRegistry.getSchema(topic))
.withCompressionCodec(codecName)
.build();
}
` As you can see intentionally secor doesn't set blockSize, pageSize, enableDictionary, and validating making these configuration useless. It's also commented like this but no reason are given Why all the parquet settings are not used ? Thanks
The parquet support code was checked in a while ago by a community member, you can check the history of this file check in to find the original author.
I think the reason why there is no extra configs was the author wanted a simplified code without extra configuration passing and setting since he doesn't need them.
I think you can submit a PR to add config/code to set parquet configurations.
On Mon, Nov 23, 2020 at 5:54 AM Richard Grossman [email protected] wrote:
Hi Question about this part of the code
public AvroParquetFileWriter(LogFilePath logFilePath, CompressionCodec codec) throws IOException { Path path = new Path(logFilePath.getLogFilePath()); LOG.debug("Creating Brand new Writer for path {}", path); CompressionCodecName codecName = CompressionCodecName .fromCompressionCodec(codec != null ? codec.getClass() : null); topic = logFilePath.getTopic(); // Not setting blockSize, pageSize, enableDictionary, and validating writer = AvroParquetWriter.builder(path) .withSchema(schemaRegistry.getSchema(topic)) .withCompressionCodec(codecName) .build(); } As you can see intentionally secor doesn't set blockSize, pageSize, enableDictionary, and validating making these configuration useless. It's also commented like this but no reason are given Why all the parquet settings are not used ? Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pinterest/secor/issues/1719, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYJP7Z4LV4CO72SB2GHYA3SRJSQXANCNFSM4T7QRMYA .