iceberg-python
iceberg-python copied to clipboard
Table commit retries based on table properties
Created a decorator which when applied to a function performs commits, and retries the function on the table. It will look at the table properties and perform reties if the execution fails.
- Created a Decorator / Descriptor Class that can wrap a function and retry it using the Tenacity retry library
- The class configures defaults based on the documented defaults found in the Iceberg docs https://iceberg.apache.org/docs/latest/configuration/#table-behavior-properties
commit.retry.num-retriescommit.retry.min-wait-mscommit.retry.max-wait-mscommit.retry.total-timeout-ms
- Config is parsed from a configured "properties" attribute/property on the instance class that is accessed within the decorator at runtime
- A separate function
table_commit_retryis used to capture the the name of the attribute on the caller that should be used when looking up table configs. - Access to the caller instance is performed via overloading the
__get__method of the class - Un-parsable config will be ignored and defaults will be used
Closes: https://github.com/apache/iceberg-python/issues/269
So I made a large fundamental change to the original design, where catalogs need to implement a function where they declare what exceptions are retryable. This becomes the bridge between the Table and Catalog. Since Table contains an instance of Catalog, our retry wrapper can grab this list of exceptions through the Table instance.
Retrying happens within the Table object and wraps the _do_commit function.
- Since
Tablecalls this function, we can grab a reference to theTableobject which we can then use to load the table'spropertiesandcommit_retry_exceptions. - With this information we can build the Retry Controler
- To support executing
refreshbefore a new attempt but after sleeping, we grab the exception the attempt received, hold on to it, and then on the next attempt but before running_do_commitwe check to see if the exception requires a refresh of the table.- I had to do this because Tenacity does not have an
after_sleepparameter, even though its supports taking abefore_sleepparameter.
- I had to do this because Tenacity does not have an