aws-sdk-pandas icon indicating copy to clipboard operation
aws-sdk-pandas copied to clipboard

fix(athena): fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error

Open erwan-simon opened this issue 1 year ago • 6 comments

Feature or Bugfix

  • Bugfix

Detail

  • Athena: fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error

Relates

  • None

Details

This is not really a bugfix given that this is a problem with Athena, not aws-sdk-pandas.

When you try to write an Iceberg table with Athena with partition keys, if there is more than 100 different created partitions, then Athena will fail with error ICEBERG_TOO_MANY_OPEN_PARTITIONS (see post here). This is really difficult to use Iceberg at scale with Athena with this limitation. One workaround is of course to use Spark, but I think it would be great if aws-sdk-for-pandas included a workaround.

So you will find my approach to a solution to this problem, simply chunk the dataframe to write to include at maximum 100 different partition keys combinations and write them sequentially using Athena.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

erwan-simon avatar Jul 01 '24 10:07 erwan-simon