dbt-databricks Advanced MERGE incremental settings

trafficstars

Describe the feature

Currently merge conditions are not supported in merge strategy.

If following the Databricks terminology, merge conditions are matched_conditions and not_matched_conditions in MERGE definition.

Taken from DBricks docs:

MERGE INTO target_table_name [target_alias]
   USING source_table_reference [source_alias]
   ON merge_condition
   { WHEN MATCHED [ AND matched_condition ] THEN matched_action |
     WHEN NOT MATCHED [ AND not_matched_condition ] THEN not_matched_action } [...]

As for now the full power of MERGE statement is not supported.

Describe alternatives you've considered

Writing customized adapter with own merge macro to perform such operations. Add conditions to configuration for model.

Additional context

Databricks is more and more often used as a tool to support a variety of data operations that was previously relied on some analytical MPPs. Specifically preparing some DWH-like normalized models for long-term entity-oriented storage and creating non-denormalized data marts.

One of the drawbacks of the merge is that update operation is basically DELETE + INSERT, so it creates some additional expenses for each record that you are updating. So it is widely used techniques to minimise such updates by additional conditions which may apply only for already observed records, and thus can't be defined as merge predicates (in dbt terminology).

Moreover, there are business cases that require that. For example, I am tracking some object and its properties as an entry of the form <o, P>. If I observe some specific instance of such record (<1, A>) I may want to skip all consequent updates if the properties are not changing. And do the update only if I see <1, B> (for example).

Who will this benefit?

Anyone who uses MERGE as a strategy for model updates and requires additional business checks for the records before making updates / inserts.

Are you interested in contributing this feature?

Probably, but due to variety of possible solutions I will appreciate the solution design suggestion to be discussed with maintainers.

Jan 05 '23 12:01 mi-volodin

Any updates on this? Seems an interesting feature

May 18 '23 07:05 adrianabreu

No updates. Just to note, that there's a work-around that currently might work - with dbt you have to manually rebuild the whole update set by your model. That makes the model code unbearably huge, but it works.

For me this just outlines the cases when dbt modelling approach is not nice. Also the usage of null-aware joins I think is missing as a feature, as MERGE is frequently used for denormalized "cubes" where missing dimensions are allowed to be NULL. So using <=> on merge predicates instead of = is also required. And I don't know a workaround for this.

May 19 '23 08:05 mi-volodin

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

Nov 16 '23 01:11 github-actions[bot]

dbt-databricks dbt-databricks copied to clipboard

Advanced MERGE incremental settings

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

dbt-databricks
dbt-databricks copied to clipboard