aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

fix(athena): fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error

Open erwan-simon opened this issue 1 year ago • 10 comments

Feature or Bugfix

Bugfix

Detail

Athena: fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error

Relates

None

Details

This is not really a bugfix given that this is a problem with Athena, not aws-sdk-pandas.

When you try to write an Iceberg table with Athena with partition keys, if there is more than 100 different created partitions, then Athena will fail with error ICEBERG_TOO_MANY_OPEN_PARTITIONS (see post here). This is really difficult to use Iceberg at scale with Athena with this limitation. One workaround is of course to use Spark, but I think it would be great if aws-sdk-for-pandas included a workaround.

So you will find my approach to a solution to this problem, simply chunk the dataframe to write to include at maximum 100 different partition keys combinations and write them sequentially using Athena.

Also edited unit tests to make them work.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

erwan-simon avatar Jul 22 '24 14:07 erwan-simon

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: bb9b67a914f8af4d1e9e51f3cb20af4e1c175dc1
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Jul 22 '24 15:07 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 3d720797b4a1fcdf50237550f2d89ed0ce6caab1
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Jul 22 '24 15:07 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 8381bc08dadadf2dcd55de07476b510ccb38c51e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Jul 22 '24 16:07 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 8381bc08dadadf2dcd55de07476b510ccb38c51e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Jul 22 '24 16:07 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: bc4a6f2104b92fc551a4b6f4a0a128a1fe26471c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Jul 22 '24 17:07 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: bc4a6f2104b92fc551a4b6f4a0a128a1fe26471c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Jul 22 '24 18:07 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 0cd9098630b354ef341c9d4c8ad400a3ee298b2c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Aug 11 '24 16:08 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 7be568b67bcdaf29881e8453c76b19113c7e7968
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Aug 11 '24 17:08 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 0cd9098630b354ef341c9d4c8ad400a3ee298b2c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Aug 11 '24 17:08 malachi-constant

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 7be568b67bcdaf29881e8453c76b19113c7e7968
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant avatar Aug 11 '24 17:08 malachi-constant