iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Doc:Add parquet,orc,avro delete codec and level

Open renshangtao opened this issue 3 years ago • 3 comments

Add the parquet,orc,avro delete codec and level properties in configuration

renshangtao avatar Jun 29 '22 10:06 renshangtao

Although we can configure these for the DELETE separately, I not recommend exposing them. If we must do so, its default value is same with data file. e.g.: long stripeSize = PropertyUtil.propertyAsLong(config, DELETE_ORC_STRIPE_SIZE_BYTES, dataContext.stripeSize()); So, maybe it should look like this:

Property Default Description
write.delete.orc.compression-codec write.orc.compression-codec ORC compression codec: zstd, lz4, lzo, zlib, snappy, none

hililiwei avatar Jul 02 '22 03:07 hililiwei

@hililiwei Yes, you are right.The parquet and avro is the same as orc. String codecAsString = config.get(DELETE_PARQUET_COMPRESSION); CompressionCodecName codec = codecAsString != null ? toCodec(codecAsString) : dataContext.codec();

@rdblue What is your opinion? Do we need to expose them.

Can we modify it like this

Property Default Description
write.delete.orc.compression-codec data compression codec ORC compression codec: zstd, lz4, lzo, zlib, snappy, none

In the current document the write.delete format.default was exposed

Property Default Description
write.format.default parquet Default file format for the table; parquet, avro, or orc
write.delete.format.default data file format Default delete file format for the table; parquet, avro, or orc

renshangtao avatar Jul 04 '22 02:07 renshangtao

I think it is good to note that the default is the data value, but document the settings exist.

rdblue avatar Aug 07 '22 17:08 rdblue