aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

Iceberg partitioning based on transformed DataFrame columns not supported?

Open petebachant opened this issue 1 year ago • 1 comments

Describe the bug

I was hoping that something like partition_cols=["user_id", "month(ts)"] would work nicely in athena.to_iceberg, but I end up with a KeyError noting that "month(ts)" doesn't exist in the DataFrame. However, the table is created properly based on the output of athena.show_create_table.

How to Reproduce

Use a partitioning function in one of the partition_cols passed into to_iceberg.

Expected behavior

I expect to not get a KeyError.

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.11

AWS SDK for pandas version

3.6.0

Additional context

No response

petebachant avatar Feb 22 '24 21:02 petebachant

Hi @petebachant unfortunately only partition columns are supported at the moment (no functions). You can transform the data in your data frame prior to the insert.

kukushking avatar Feb 26 '24 18:02 kukushking

Hi @kukushking that's not what documentation says: image

https://aws-sdk-pandas.readthedocs.io/en/stable/stubs/awswrangler.athena.to_iceberg.html#awswrangler.athena.to_iceberg

I was using append mode a couple of days ago successfully on version 3.4, now it fails on version 3.6, I tried to check the wrangler code but I do not totally understand why I did not get error in 3.4 since the code involved in this error seems not to be updated since 3.4 version

alvaro-ponce avatar Feb 29 '24 15:02 alvaro-ponce

Ok, it was 3.3.0, I have checked and I can append multiple times with function in partition cols in 3.3.0 version, not 3.4 as I said before.

I believe the problem appeared in version 3.4.1 with the addition of _determine_differences function in _write_iceberg.py script.

alvaro-ponce avatar Feb 29 '24 16:02 alvaro-ponce

Hey, thanks for bringing this up. You are correct, this feature was broken in the PR you tagged. I'm looking into whether we can fix this feature. If not, I will update the documentation.

LeonLuttenberger avatar Feb 29 '24 17:02 LeonLuttenberger

Thanks for the quick response and work!

You are awesome!

alvaro-ponce avatar Feb 29 '24 18:02 alvaro-ponce