iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

AWS: Retain Glue Catalog column comment after updating Iceberg table

Open lawofcycles opened this issue 1 year ago • 7 comments

What this PR does

This PR addresses the issue where column comments in the Glue catalog are not retained when updating an existing Iceberg table schema. It fixes the problem by modifying the IcebergToGlueConverter.setTableInputInformation method to preserve existing column comments from the Glue table when creating a new TableInput for an update operation.

When an existing Glue table is provided, the modified method retrieves the comments from its columns and applies them to the corresponding columns in the new TableInput, but only if the Iceberg table's column does not already have a comment.

This change ensures that any user-defined column comments in the Glue catalog are carried over during table updates, preventing unintentional loss of documentation.

Testing

Added a new unit test method testSetTableInputInformationWithExistingTable to verify that column comments from an existing Glue table are correctly preserved when creating a new TableInput.
I'll also add Integration Tests same as #10199.

Fixes #10220

lawofcycles avatar May 06 '24 16:05 lawofcycles

I built the jars and manually tested on Amazon EMR as a Spark runtime.

lawofcycles avatar May 12 '24 00:05 lawofcycles

Hi @geruh would you review this PR? I noticed you are reviewing https://github.com/apache/iceberg/pull/10199, so I'd like to ask for this similar issue.

lawofcycles avatar May 13 '24 01:05 lawofcycles

@geruh Thank you for your review! I have pushed the changes addressing your review comments. Could you please take another look when you have a chance?

lawofcycles avatar May 26 '24 19:05 lawofcycles

Hi @geruh would you review the latest change?

lawofcycles avatar Jun 03 '24 21:06 lawofcycles

Hi @amogh-jahagirdar would you review the latest change? I noticed you are reviewing https://github.com/apache/iceberg/pull/10199, so I'd like to ask for this similar issue.

lawofcycles avatar Jun 08 '24 23:06 lawofcycles

Hi @amogh-jahagirdar,

I hope this message finds you well. I wanted to kindly follow up on this PR, which addresses an important issue with preserving Glue Catalog column comments when updating Iceberg tables (#10220).

Could you please take a look when you have a moment? Thank you for your time and consideration.

lawofcycles avatar Jun 25 '24 15:06 lawofcycles

Really sorry for the delayed review @lawofcycles , I'll take a look today.

amogh-jahagirdar avatar Jun 26 '24 18:06 amogh-jahagirdar

Let's get this in, thanks @lawofcycles for working on this, and @amogh-jahagirdar and @rahil-c for the review!

Fokko avatar Jul 05 '24 06:07 Fokko

Thanks for reviewing @amogh-jahagirdar , @aajisaka , @geruh and @rahil-c !!

lawofcycles avatar Jul 05 '24 09:07 lawofcycles

Thank you for merging @Fokko !

lawofcycles avatar Jul 05 '24 09:07 lawofcycles