aws-sdk-pandas
aws-sdk-pandas copied to clipboard
fix(athena): fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error
Feature or Bugfix
- Bugfix
Detail
- Athena: fixed write_iceberg which could fire a ICEBERG_TOO_MANY_OPEN_PARTITIONS Athena error
Relates
- None
Details
This is not really a bugfix given that this is a problem with Athena, not aws-sdk-pandas.
When you try to write an Iceberg table with Athena with partition keys, if there is more than 100 different created partitions, then Athena will fail with error ICEBERG_TOO_MANY_OPEN_PARTITIONS (see post here). This is really difficult to use Iceberg at scale with Athena with this limitation. One workaround is of course to use Spark, but I think it would be great if aws-sdk-for-pandas included a workaround.
So you will find my approach to a solution to this problem, simply chunk the dataframe to write to include at maximum 100 different partition keys combinations and write them sequentially using Athena.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.