secor icon indicating copy to clipboard operation
secor copied to clipboard

Secor Avro Parquet doesn't use any configuration for parquet writer

Open richiesgr opened this issue 4 years ago • 1 comments

Hi Question about this part of the code

` public AvroParquetFileWriter(LogFilePath logFilePath, CompressionCodec codec) throws IOException { ...

        // Not setting blockSize, pageSize, enableDictionary, and validating
        writer = AvroParquetWriter.builder(path)
                .withSchema(schemaRegistry.getSchema(topic))
                .withCompressionCodec(codecName)
                .build();
    }

` As you can see intentionally secor doesn't set blockSize, pageSize, enableDictionary, and validating making these configuration useless. It's also commented like this but no reason are given Why all the parquet settings are not used ? Thanks

richiesgr avatar Nov 23 '20 13:11 richiesgr

The parquet support code was checked in a while ago by a community member, you can check the history of this file check in to find the original author.

I think the reason why there is no extra configs was the author wanted a simplified code without extra configuration passing and setting since he doesn't need them.

I think you can submit a PR to add config/code to set parquet configurations.

On Mon, Nov 23, 2020 at 5:54 AM Richard Grossman [email protected] wrote:

Hi Question about this part of the code

public AvroParquetFileWriter(LogFilePath logFilePath, CompressionCodec codec) throws IOException { Path path = new Path(logFilePath.getLogFilePath()); LOG.debug("Creating Brand new Writer for path {}", path); CompressionCodecName codecName = CompressionCodecName .fromCompressionCodec(codec != null ? codec.getClass() : null); topic = logFilePath.getTopic(); // Not setting blockSize, pageSize, enableDictionary, and validating writer = AvroParquetWriter.builder(path) .withSchema(schemaRegistry.getSchema(topic)) .withCompressionCodec(codecName) .build(); } As you can see intentionally secor doesn't set blockSize, pageSize, enableDictionary, and validating making these configuration useless. It's also commented like this but no reason are given Why all the parquet settings are not used ? Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pinterest/secor/issues/1719, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABYJP7Z4LV4CO72SB2GHYA3SRJSQXANCNFSM4T7QRMYA .

HenryCaiHaiying avatar Nov 23 '20 21:11 HenryCaiHaiying