dataform icon indicating copy to clipboard operation
dataform copied to clipboard

overwrite filter strategy

Open stankiewicz opened this issue 3 years ago • 0 comments

Goal - Bring more flexibility/strategies into incremental models. updatePartitionFilter is not enough. Additional insert-overwrite strategy with DML (delete) before model execution. Delete statement should select partitions dynamically or based on static input.

Why it's needed: Customers are calculating expensive aggregates. Sometimes there are no unique keys in input and output - incremental model is append only and leveraging pre-statement is error prone.

Solution suggested: Adapter for incremental tables should support:

  • Append only (especially if no keys are provided)
  • Merge statement if keys are provided
  • Insert overwrite via delete from and insert

Partitioning should not be enforced, some SCD tables, like in data vault can be clustered only.

Insert overwrite strategy allows setting a overwrite_filter:

  • Default (empty) - when partition_by is used, then there will be DML invoked that is running DELETE from based on columns used with partition_by, otherwise it will fail
  • Custom like overwrite_filter = "current_date()" or overwrite_filter=${dataform.projectConfig.vars.date}

stankiewicz avatar May 27 '22 15:05 stankiewicz