aws-sdk-pandas Athena to iceberg method not writting data to columns that are new in the schema

Describe the bug

I have a table that was created by a glue job. I want to append data to that table using AWS Wrangler. The writting process seems to work fine, but when I check on Athena, the columns that were not there before are added but appear to be completely empty, while there were no nulls in my dataframe.

If I delete the rows I appended and write the data again using AWS Wrangler, the table is updated correctly, since the columns are not new anymore.

How to Reproduce

I tried replicating the issue using just AWS Wrangler and I could not do it. Try having a glue job create an iceberg table and then try to update this table with an extra column using wrangler.

Expected behavior

No response

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.10

AWS SDK for pandas version

3.9.0

Additional context

No response

Oct 16 '24 12:10 lautarortega

Hi @lautarortega thanks for opening this!

The columns that were not there before are added but appear to be completely empty, 
while there were no nulls in my dataframe

Just to double confirm and make sure I understand the issue, since you are appending data, any existing data in the table would not have values for the new columns. Are you appending or overwriting? Or is the table empty when you are appending?

Note: there was a fix merged in https://github.com/aws/aws-sdk-pandas/pull/2982 to a related issue regarding how Iceberg treats new columns. I recommend to upgrade to AWS SDK for pandas 3.10.0.

Nov 15 '24 12:11 kukushking

Hi @kukushking, thanks for reaching out.

I am appending data to the table. The problem is with the new data. Athena table pre append:

Name	Age
John	25
Jane	32
Bob	45
Alice	28

Local df

Name	Age	City
Paula	25	Munich
Paul	28	Buenos Aires

Athena post append:

Name	Age	City
John	25
Jane	32
Bob	45
Alice	28
Paula	25
Paul	28

Expected table:

Name	Age	City
John	25
Jane	32
Bob	45
Alice	28
Paula	25	Munich
Paul	28	Buenos Aires

I did some tests, and I think it might be related to the fact the the table was created with a Glue job. When creating a table from scratch with AWSWrangler and only using wrangler, seems to work just fine.

My current workaround is doing a write once. It will have missing data, but it will update the schema. I then delete that las batch of data from the Athena console, and then write the data again. I think that not having to evolve the schema makes it work fine.

Nov 18 '24 09:11 lautarortega

Hi @lautarortega thanks - which version of AWS SDK for pandas are you using? The pull request that I linked above fixes representation of the current Iceberg columns and 3.10.0 version should display data for all columns. Additionally, verify that latest Glue schema contains the new column.

Nov 19 '24 12:11 kukushking

I was running 3.9.0. I tested today 3.10.1 and it is failing in a new way, that 3.9.0 wasn't. I debugged the execution and in the _determine_differences method it is failing to get the catalog_column_types. It is related to the change of current columns.

_utils.py line 41: if not filter_iceberg_current or col.get("Parameters", {}).get("iceberg.field.current") == "true": dtypes[col["Name"]] = col["Type"]

These are the parameters my fields are getting, so nothing related to iceberg.field.current 'Parameters': {'isPrimaryKey': 'false', 'containsPii': 'false'}} When I go to Athena through the AWS Console, I am getting "current" fields, the old ones are not showing but the intended ones are. So, is there any particular action that should be taken to get this parameter?

Dec 06 '24 15:12 lautarortega

I'm having the same issue as in the original bug description in Python 3.12.6 and AWS SDK for pandas 3.12.1.

Aug 05 '25 00:08 Braalfa

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.

Nov 02 '25 18:11 github-actions[bot]

aws-sdk-pandas aws-sdk-pandas copied to clipboard

Athena to iceberg method not writting data to columns that are new in the schema

Describe the bug

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Additional context

aws-sdk-pandas
aws-sdk-pandas copied to clipboard