cloudfriend icon indicating copy to clipboard operation
cloudfriend copied to clipboard

DATAPLT-1268 Add shortcut for Iceberg-backed Glue Table

Open jeffhiltz opened this issue 3 weeks ago • 0 comments

Manual Configuration

Optimizer Configuration

The Table Optimizer API has a number of configuration options that are not exposed in CloudFormation.

CompactionConfiguration

Setting CompactionConfiguration can only be done via API calls (ie: CLI) after the resource has been constructed. Compaction can be enabled using this shortcut, but it cannot be configured. For many cases, the default configuration may be sufficient. The following options require post-creation manual configuration:

  • strategy: the default is binpack. Note that using sort or z-order requires the table to have the sort order manually set via Spark SQL.
  • minInputFiles: minimum number of files to in order to initiate a compaction, default is 100
  • deleteFileThershold: minimum number of deletes that must be present in a data file to make it eligible for compaction, default is 1

OrphanFileDeletionConfiguration

CloudFormation includes support for setting the OrphanFileRetentionPeriodInDays property, but the following must be set using the API/CLI:

  • location: a sub-directory in which to look for files, default is the table location
  • runRateInHours: interval in hours between orphan file deletion job runs, default is 24

RetentionConfiguration

CloudFormation includes support for setting the cleanExpiredFiles, numberOfSnapshotsToRetain and snapshotRetentionPeriodInDays properties, but the following must be set using the API/CLI:

  • runRateInHours: interval in hours between retention job runs, default is 24

Sort Order

Sort order can only be set using Spark SQL. TODO: add details

Testing

TODO:

  • use the shortcut to create some tables and use them
  • make sure that example Spark SQL code works for setting order (and that the table keeps working)
  • try making a table that uses bucketing (we don't need to do anything extra to support that, right? it's in partition definition? or?)

jeffhiltz avatar Dec 19 '25 16:12 jeffhiltz