aws-sdk-pandas
aws-sdk-pandas copied to clipboard
Iceberg commit errors thrown when using overwrite partition somehow fails to clean up temp table
Describe the bug
When concurrent processes attempt to update the same iceberg table and one hits an ICEBERG_COMMIT_ERROR, sometimes the temp_table created by delete_from_iceberg_table fails to get cleaned up. Reading through the source code, I don't see a clear path for how this would be possible, since the code seems to correctly catch the exception and clean up in finally, but nonetheless the temp tables are getting left behind somehow.
I have not been able to create a reliable minimum code sample to replicate this behavior consistently, but in production we occasionally are hitting commit errors and accumulating temp tables in the glue catalog as a result:
awswrangler.exceptions.QueryFailed: ICEBERG_COMMIT_ERROR: Failed to commit Iceberg update to table
--
-- | -- wr.athena.to_iceberg(**write_args) |
-- | --
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/_config.py", line 715, in wrapper |
| | 2024-08-07 08:06:24 | return function(**args) |
| | 2024-08-07 08:06:24 | ^^^^^^^^^^^^^^^^ |
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/_utils.py", line 178, in inner |
| | 2024-08-07 08:06:24 | return func(*args, **kwargs) |
| | 2024-08-07 08:06:24 | ^^^^^^^^^^^^^^^^^^^^^ |
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/athena/_write_iceberg.py", line 452, in to_iceberg |
| | 2024-08-07 08:06:24 | delete_from_iceberg_table( |
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/_config.py", line 715, in wrapper |
| | 2024-08-07 08:06:24 | return function(**args) |
| | 2024-08-07 08:06:24 | ^^^^^^^^^^^^^^^^ |
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/_utils.py", line 178, in inner |
| | 2024-08-07 08:06:24 | return func(*args, **kwargs) |
| | 2024-08-07 08:06:24 | ^^^^^^^^^^^^^^^^^^^^^ |
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/athena/_write_iceberg.py", line 680, in delete_from_iceberg_table |
| | 2024-08-07 08:06:24 | wait_query(query_execution_id=query_execution_id, boto3_session=boto3_session) |
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/_config.py", line 715, in wrapper |
| | 2024-08-07 08:06:24 | return function(**args) |
| | 2024-08-07 08:06:24 | ^^^^^^^^^^^^^^^^ |
| | 2024-08-07 08:06:24 | File "/usr/local/lib/python3.11/site-packages/awswrangler/athena/_executions.py", line 237, in wait_query |
| | 2024-08-07 08:06:24 | raise exceptions.QueryFailed(response["Status"].get("StateChangeReason"))
How to Reproduce
Broadly: Have two processes call to_iceberg on the same catalog table simultaneously, with write params similar to:
{
"df": to_write.copy(),
"mode": "overwrite_partitions",
"partition_cols": self.partition_cols,
"database": self.glue_database_name,
"table": self.table_name,
"table_location": self.table_s3_path(),
"temp_path": tmp_table_path,
"boto3_session": self.boto3_client_wrapper.get_session(),
"schema_evolution": True,
"fill_missing_columns_in_df": True,
"keep_files": False,
}
If a commit error occurs, sometimes a temp table is left behind.
Expected behavior
No response
Your project
No response
Screenshots
No response
OS
Linux
Python version
3.11
AWS SDK for pandas version
3.7.3
Additional context
No response
This seems similar to https://github.com/aws/aws-sdk-pandas/issues/2826, looks like it was hard to replicate back then too
Interesting, yes, seems very similar. Makes me wonder if there is a consistency issue with the Glue API that means a delete on an existent table can fail as table not found if it happens too soon after table creation. I can't seem to find anything in the docs specific to that though.
Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 7 days it will automatically be closed.