iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Table commit retries based on table properties

Open Buktoria opened this issue 1 year ago • 1 comments

Created a decorator which when applied to a function performs commits, and retries the function on the table. It will look at the table properties and perform reties if the execution fails.

  • Created a Decorator / Descriptor Class that can wrap a function and retry it using the Tenacity retry library
  • The class configures defaults based on the documented defaults found in the Iceberg docs https://iceberg.apache.org/docs/latest/configuration/#table-behavior-properties
    • commit.retry.num-retries
    • commit.retry.min-wait-ms
    • commit.retry.max-wait-ms
    • commit.retry.total-timeout-ms
  • Config is parsed from a configured "properties" attribute/property on the instance class that is accessed within the decorator at runtime
  • A separate function table_commit_retry is used to capture the the name of the attribute on the caller that should be used when looking up table configs.
  • Access to the caller instance is performed via overloading the __get__ method of the class
  • Un-parsable config will be ignored and defaults will be used

Closes: https://github.com/apache/iceberg-python/issues/269

Buktoria avatar Jan 30 '24 21:01 Buktoria

So I made a large fundamental change to the original design, where catalogs need to implement a function where they declare what exceptions are retryable. This becomes the bridge between the Table and Catalog. Since Table contains an instance of Catalog, our retry wrapper can grab this list of exceptions through the Table instance.

Retrying happens within the Table object and wraps the _do_commit function.

  • Since Table calls this function, we can grab a reference to the Table object which we can then use to load the table's properties and commit_retry_exceptions.
  • With this information we can build the Retry Controler
  • To support executing refresh before a new attempt but after sleeping, we grab the exception the attempt received, hold on to it, and then on the next attempt but before running _do_commit we check to see if the exception requires a refresh of the table.
    • I had to do this because Tenacity does not have an after_sleep parameter, even though its supports taking a before_sleep parameter.

Buktoria avatar Mar 18 '24 18:03 Buktoria