parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

GH-3223: Implement Variant parquet writer

Open cashmand opened this issue 6 months ago • 2 comments

Rationale for this change

Provide a reference implementation for writing Variant values from Parquet based on the Variant shredding spec in https://github.com/apache/parquet-format/blob/master/VariantShredding.md.

What changes are included in this PR?

Adds a VariantValueWriter to write Variant values to a Parquet file with a shredding schema.

The PR also adds support in parquet-avro to identify Variant fields in a Parquet schema, and write using the shredding schema.

Are these changes tested?

Uni tests included in the PR, based on similar tests in https://github.com/apache/iceberg/blob/main/parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java.

Are there any user-facing changes?

Connectors can now write to a shredded Variant table, and the parquet-avro connector will do this if a shredding schema is provided.

Closes #3223

cashmand avatar May 20 '25 21:05 cashmand

Thanks for the updates, @cashmand! I think this looks good to go. Running the tests again.

rdblue avatar Jun 20 '25 23:06 rdblue

Running tests

rdblue avatar Jun 23 '25 21:06 rdblue

Merged! Thanks @cashmand for getting this in. Nice work!

rdblue avatar Jun 27 '25 19:06 rdblue