ggershinsky

Results 19 comments of ggershinsky

hi @huaxingao , can you describe the lifecycle of the column IDs at a high level, either in the PR description, or in a comment? Where these IDs are stored...

Thanks @huaxingao , one more question / clarification. In the writer, > field_id has to be unique in the entire schema, otherwise, an Exception will be thrown. what happens if...

to add the parquet encryption angle to this discussion. This feature adds protection of confidentiality and integrity of parquet files (when they have columns with sensitive data). These security layers...

@gszadovszky I certainly agree the encryption feature is not ready yet to be on this list. According to the definition, we need to "have at least two different implementations that...

Optimizations like using byte arrays instead of byte buffers, and allocating the byte array once only, instead of per operation. Done in a concise manner, without unnecessary code changes. LGTM.

Yep, we test encryption interop using binary files in the parquet-testing repo. @wgtmac Please have a look at this code: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryptionOptions.java#L130

> > Yep, we test encryption interop using binary files in the parquet-testing repo. @wgtmac Please have a look at this code: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestEncryptionOptions.java#L130 > > @ggershinsky @emkornfield Yes I did...

This breaks the parquet columnar encryption mode. We use the parquet "uniform" encryption mode instead for file encryption in Iceberg. Please have a look at https://github.com/apache/iceberg/pull/2639

I would also like to recommend adding @matthieun as a co-author to this PR, per the discussion in the parallel PR.