iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Iceberg Table properties not replaced when DataFrameWriterV2 replace() is used

Open swapna267 opened this issue 3 years ago • 3 comments

Based on Replace() documentation , Table properties should be replaced with the one's associated with the data frame.

"The existing table's schema, partition layout, properties, and other configuration will be replaced with the contents of the data frame and the configuration set on this writer." https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/sql/DataFrameWriterV2.html#replace--

https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L655 For iceberg tables, the current behavior is that table properties are getting updated , but not replaced. Is this the expected behavior ?

swapna267 avatar Sep 22 '22 17:09 swapna267

https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L655 Is there any specific reason for updating it than replacing it? @rdblue @aokolnychyi @RussellSpitzer @flyrain @szehon-ho

karuppayya avatar Sep 22 '22 17:09 karuppayya

Not aware of the reason of the original decision, but replacing makes more sense to me. The lingering properties will potentially mess up the behavior, and surprise users.

flyrain avatar Sep 22 '22 18:09 flyrain

The new properties should be merged, not replaced. There are many properties that should live with the table across versions. For instance, the compression codec is based on the data in the table and should not be dropped because the operation replacing the table's data doesn't specify one. Instead, properties should be respected by always merged with the existing ones.

rdblue avatar Sep 22 '22 18:09 rdblue

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Mar 22 '23 00:03 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Apr 05 '23 00:04 github-actions[bot]