aws-sdk-pandas
aws-sdk-pandas copied to clipboard
fix(athena): fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error
Feature or Bugfix
Bugfix
Detail
Athena: fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error
Relates
None
Details
This is not really a bugfix given that this is a problem with Athena, not aws-sdk-pandas.
When you try to write an Iceberg table with Athena with partition keys, if there is more than 100 different created partitions, then Athena will fail with error ICEBERG_TOO_MANY_OPEN_PARTITIONS (see post here). This is really difficult to use Iceberg at scale with Athena with this limitation. One workaround is of course to use Spark, but I think it would be great if aws-sdk-for-pandas included a workaround.
So you will find my approach to a solution to this problem, simply chunk the dataframe to write to include at maximum 100 different partition keys combinations and write them sequentially using Athena.
Also edited unit tests to make them work.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
AWS CodeBuild CI Report
- CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
- Commit ID: bb9b67a914f8af4d1e9e51f3cb20af4e1c175dc1
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
- Commit ID: 3d720797b4a1fcdf50237550f2d89ed0ce6caab1
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
- Commit ID: 8381bc08dadadf2dcd55de07476b510ccb38c51e
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
- Commit ID: 8381bc08dadadf2dcd55de07476b510ccb38c51e
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
- Commit ID: bc4a6f2104b92fc551a4b6f4a0a128a1fe26471c
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
- Commit ID: bc4a6f2104b92fc551a4b6f4a0a128a1fe26471c
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
- Commit ID: 0cd9098630b354ef341c9d4c8ad400a3ee298b2c
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
- Commit ID: 7be568b67bcdaf29881e8453c76b19113c7e7968
- Result: SUCCEEDED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
- Commit ID: 0cd9098630b354ef341c9d4c8ad400a3ee298b2c
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
AWS CodeBuild CI Report
- CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
- Commit ID: 7be568b67bcdaf29881e8453c76b19113c7e7968
- Result: FAILED
- Build Logs (available for 30 days)
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository