ggershinsky comments

Results 19 comments of


                                            ggershinsky

PARQUET-2006: Column resolution by ID

hi @huaxingao , can you describe the lifecycle of the column IDs at a high level, either in the PR description, or in a comment? Where these IDs are stored...

PARQUET-2006: Column resolution by ID

Thanks @huaxingao , one more question / clarification. In the writer, > field_id has to be unique in the entire schema, otherwise, an Exception will be thrown. what happens if...

PARQUET-2006: Column resolution by ID

I'll join too.

PARQUET-1950: Define core features

to add the parquet encryption angle to this discussion. This feature adds protection of confidentiality and integrity of parquet files (when they have columns with sensitive data). These security layers...

PARQUET-1950: Define core features

@gszadovszky I certainly agree the encryption feature is not ready yet to be on this list. According to the definition, we need to "have at least two different implementations that...

Performance optimization to ByteBitPackingValuesReader

Optimizations like using byte arrays instead of byte buffers, and allocating the byte array once only, instead of per operation. Done in a concise manner, without unnecessary code changes. LGTM.

PARQUET-2196: Support LZ4_RAW codec

Yep, we test encryption interop using binary files in the parquet-testing repo. @wgtmac Please have a look at this code: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryptionOptions.java#L130

PARQUET-2196: Support LZ4_RAW codec

> > Yep, we test encryption interop using binary files in the parquet-testing repo. @wgtmac Please have a look at this code: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryptionOptions.java#L130 > > @ggershinsky @emkornfield Yes I did...

Parquet-MR Encryption - Modify to true to encrypt

This breaks the parquet columnar encryption mode. We use the parquet "uniform" encryption mode instead for file encryption in Iceberg. Please have a look at https://github.com/apache/iceberg/pull/2639

PARQUET-1711: support recursive proto schemas by limiting recursion depth

I would also like to recommend adding @matthieun as a co-author to this PR, per the discussion in the parallel PR.