dbt-databricks icon indicating copy to clipboard operation
dbt-databricks copied to clipboard

Support for partition overwrite with Delta

Open dejan opened this issue 3 years ago • 7 comments

Describe the feature

This was reported in https://github.com/dbt-labs/dbt-spark/issues/155 but I think you might be more interested in resolving the issue

Currently, the insert_overwrite strategy throws an error if file format is set to delta because it doesn't support dynamic partition overwrite

Delta already supports partitions overwrite but it seems that dbt adapter implementation is not making use of it.

Describe alternatives you've considered

I could not find a way to atomically overwrite a partition.

Who will this benefit?

Everyone using dbt and Delta.

dejan avatar Apr 11 '22 08:04 dejan

@dejan a quick update on this. The Delta folks at Databricks are looking at supporting dynamic partition overwrite. It's prioritized in their roadmap, I'll post back here once it's released.

bilalaslamseattle avatar May 19 '22 12:05 bilalaslamseattle

Thanks @bilalaslamseattle !

dejan avatar May 19 '22 13:05 dejan

Any updates on this?

creativedutchmen avatar Jul 20 '22 12:07 creativedutchmen

@creativedutchmen this capability it is in Delta Lake 2.0. We now have to implement it in dbt-databricks. It's on our radar. @superdupershant @ueshin and @allisonwang-db FYI.

bilalaslamseattle avatar Jul 25 '22 06:07 bilalaslamseattle

+1 on this. Adding one data point that this will be a blocker for us for adapting dbt

lwbayes avatar Aug 02 '22 06:08 lwbayes

Great, thanks! This will greatly reduce the runtime of some of our heaviest models :)

creativedutchmen avatar Aug 03 '22 08:08 creativedutchmen

I submitted a PR to include it in dbt-spark if you want to look at it: https://github.com/dbt-labs/dbt-spark/pull/430

flvndh avatar Aug 12 '22 08:08 flvndh

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue.

github-actions[bot] avatar Feb 09 '23 02:02 github-actions[bot]

Still relevant for me

creativedutchmen avatar Feb 09 '23 07:02 creativedutchmen

@bilalaslamseattle please share some updates on this.

dejan avatar Feb 09 '23 08:02 dejan

This issue has been address by #310

andrefurlan-db avatar Jun 15 '23 03:06 andrefurlan-db

I don't think it was a good decision to close this because #310 has a major flaw which was not resolved (only documented) https://github.com/databricks/dbt-databricks/issues/334

dejan avatar Jan 24 '24 15:01 dejan

I haven't confirmed this but by looking at the documentation insert_overwrite apparently now supports dynamic partition overwrite (it no longer errors for Delta), however it is stated that it only works for All-purpose clusters which is also a major drawback as that's not cost-efficient.

Can someone please confirm the status quo and plans on properly providing such a basic and common case such as partition overwrite?

dejan avatar Jan 25 '24 07:01 dejan

however it is stated that it only works for All-purpose clusters which is also a major drawback as that's not cost-efficient. @andrefurlan-db can you confirm if this is correct? Seems odd.

bilalaslamseattle avatar Jan 27 '24 09:01 bilalaslamseattle

@bilalaslamseattle / @andrefurlan-db was there an update on this one or a thread somewhere else? When I run partition overwrite against SQL warehouses, I get:

Error running query: [_LEGACY_ERROR_TEMP_DBR_0222] org.apache.spark.sql.catalyst.ExtendedAnalysisException: Configuration spark.sql.sources.partitionOverwriteMode is not available.

Looks like its attempting to set some custom spark config which I don't think is allowed on SQL warehouses.

benwhelankf avatar Feb 26 '24 13:02 benwhelankf

Are there any plans to implement it? Or any updates?

SemyonSinchenko avatar May 29 '24 08:05 SemyonSinchenko