dbt-bigquery [Bug] DBT clone when partition or cluster has changed throws error

Is this a new bug in dbt-bigquery?

[X] I believe this is a new bug in dbt-bigquery
[X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Say I have a model that has partition at the day granularity in production. I update the model to use month for partition granularity, and successfully build the model in staging. In my production step, I proceed to clone the model to production using dbt clone. Bigquery will throw

Cannot replace a table with a different partitioning spec. Instead, DROP the table, 
and then recreate it. New partitioning spec is interval(type:month,field:timestamp) 
and existing spec is interval(type:day,field:timestamp)

Expected Behavior

When there's a partition or cluster change, we should drop the existing model first then clone the model with new partition / clustering.

Steps To Reproduce

Create a simple model

--- sample_model.sql
{{ 
    config(
        cluster_by = ['col_1', 'col_2'], 
        partition_by = {
	        'field': 'timestamp', 
	        'data_type': 'timestamp',
	        'granularity': 'day'
        },
    ) 
}}

select
    'col_1' as col_1, 
    'col_2' as col_2,
    CURRENT_TIMESTAMP() as timestamp

run dbt run -s sample_model -t staging or some kind of dev / staging environment you have and run dbt clone -s sample_model -t prod --full-refresh so the same model exists in prod

--- sample_model.sql
{{ 
    config(
        cluster_by = ['col_1'], 
        partition_by = {
	        'field': 'timestamp', 
	        'data_type': 'timestamp',
	        'granularity': 'month'
        },
    ) 
}}

select
    'col_1' as col_1, 
    'col_2' as col_2,
    CURRENT_TIMESTAMP() as timestamp

now change the cluster and partition, then run run dbt run -s sample_model -t staging or some kind of dev / staging environment you have and run dbt clone -s sample_model -t prod so the same model exists in prod you should see an error throw by bigquery telling you to drop the model first

Relevant log output

No response

Environment

- OS: Mac
- Python: 3.8.12
- dbt-core: 1.6.6 
- dbt-bigquery: 1.6.7

Additional Context

similar to the behavior in building incremental models when there's a partition / clustering change https://github.com/dbt-labs/dbt-core/blob/40839c79fcf1854942f8db44d1a2648bffe0a2ab/plugins/bigquery/dbt/include/bigquery/macros/materializations/incremental.sql#L23-L29

https://github.com/dbt-labs/dbt-core/issues/2000

Mar 28 '24 15:03 SPTKL

Thanks for opening @SPTKL --- I'm glad you're finding the clone command useful.

My concern here is predominately user-experience-related, namely

when is --full-refresh required, and when isn't it?

Allow me to explain why I pick that as the framing question.

You propose the following as expected behavior:

When there's a partition or cluster change, we should drop the existing model first then clone the model with new partition / clustering.

I agree, but with an important caveat: only if the --full-refresh flag (or perhaps a new flag) is passed.

My rationale is that partitioning can have large a cost impact-- especially on large tables. If a user accidentally clones a dev version of a model with new partitioning to prod and it errors because the partitioning doesn't match -- this is a good thing!

If you look at the corresponding incremental code you shared, the full DROP in lieu of REPLACE will only happen in the context of --full-refresh

In some ways, the --full-refresh flag is a synonym for

I don't care about what's already in the destination, delete if you need!

What do you think? Let me know if I'm off base here

Apr 15 '24 19:04 dataders

@dataders thanks for your comment! Yes it totally makes sense. The drop should only happen when there's --full-refresh it also seems like without --full-refresh, clone will throw Relation xxxxx already exists instead of copying anything

Apr 16 '24 14:04 SPTKL

Thank you for the conversion here! I think we will need more information to formulate the right solution at this time. @dataders brings up a very good point: the cost of partitioning a new model because of an accidental rebuild is not a great experience. I'm also wary of how to ensure a seamless user experience with the full-refresh. Due to this, I am going to push this into something we will not be tackling on our roadmap.

Aug 02 '24 12:08 amychen1776

dbt-bigquery dbt-bigquery copied to clipboard

[Bug] DBT clone when partition or cluster has changed throws error

Is this a new bug in dbt-bigquery?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

dbt-bigquery
dbt-bigquery copied to clipboard