dbt-core
dbt-core copied to clipboard
[CT-3229] Materialized views are not recreated after configuration change
Is this a new bug in dbt-core?
- [X] I believe this is a new bug in dbt-core
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
When the model definition of a materialized view is changed and then dbt run is executed, the materialized view is refreshed, but the changes are not applied. The changes are applied only with dbt run --full-refresh. This behaviour is independent from the parameter on_configuration_change.
Expected Behavior
- If
dbt runis executed without changing the model, the materialized view should be refreshed (works) - If
dbt runis executed with changes, the materialized view should be recreated (does not work: just a refresh happens) - If
dbt run --full-refreshis executed, the materialized view should always be recreated (works)
Steps To Reproduce
- Create file
test.sql
{{ config(materialized='materialized_view') }}
SELECT 1
- Execute
dbt run - The materialized view is created, log output:
create materialized view if not exists "database"."schema"."test" as SELECT 1; - Change
test.sql
{{ config(materialized='materialized_view') }}
SELECT 1, 2
- Execute
dbt run - The new definition is not applied, log output:
refresh materialized view "database"."schema"."test"
Relevant log output
No response
Environment
- OS: Windows
- Python: 3.11.5
- dbt-core: 1.6.6
Which database adapter are you using with dbt?
postgres
Additional Context
No response
fwiw i'm also experiencing this, i did mistakenly raise an issue here: https://github.com/dbt-labs/dbt-redshift/issues/638 which i've now closed as a dupe of this. Linking here as it may add some more information
I'll start by saying you're correct, this definitely doesn't feel intuitive and very likely needs to be addressed. But I'm also going to argue semantics. The change highlighted in this issue is not a configuration change, but rather a change to the sql itself. I think of configuration changes as the contents of the config() block. Tracking changes to the sql is easier in some platforms and more difficult in others; in particular, some platforms will save the sql as it was written, and some will save a rewritten/optimized version of the sql. It's potentially a difficult thing to generalize.
So I'm going to offer a rephrasing of this issue for folks to discuss:
Should we monitor changes to the sql within a model? If so, do we work that change into the materialization as another condition for logic flow (it would likely need to be a full refresh)? If not, how are we clear to the users that we are not monitoring for sql changes and --full-refresh will need to be explicitly provided?
Cc: @Fleid @graciegoheen @dataders @dbeatty10 for awareness
Refinement
@Fleid curious to hear if you expected a change in the SQL logic for a materialized view to trigger a data and/or database update when on_configuration_change="apply"?
If changing the SQL logic isn't taken into account, then I think that would be an outlier compared to our other materialization types.
Looking more closely at materialized types
Table, view, & ephemeral
When materialized is table, view, or ephemeral, and the SQL logic changes, it will update any underlying database object and change the resulting data when queried.
Incremental
The only other standard materialization type other than materialized view that needs similar "on change" logic is the incremental materialization. Even with the default of on_schema_change="ignore", the new SQL logic will also be taken into account, it just won't add/remove columns or change their data types.
Is that what I want? Yes. Is that what's in the spec? No.
Why is because there is no way for us to determine if the SQL has changed:
- We can only compare the desired state (model logic) to the previously applied state (current logic existing in the database): one of our primitives is that dbt core doesn't have access to the previously desired state (model logic from a previous dbt run)
- And sadly, most warehouses don't allow us to extract the applied state (as in BigQuery please give me the DDL of that materialized view). Quite frankly, even if they did, it would be a nightmare to compare these 2 "strings"
What we can do instead is add on_schema_change, the same way we have for incremental models.
Note that on_configuration_change is inspired by on_schema_change, but it's not random that it's not the same name (+1 @mikealfare on the semantics here) ;)
Now we would not be tracking the logic anymore, but the signature (column names and types) of the model. That's do-able.
Was going to bump this as I am experiencing a similar roadblock.
I am considering converting my materialized views to tables to mitigate this, and would rather not do this. I like the changes that @Fleid is suggesting, that seems to make sense.
I have also considered creating my own custom materialization to address this.
Just wanted to chime in and say that this sort of thing would be hugely beneficial.
Dbt is great! 🔥 🔥 Thank you all for your work.
Hello Everyone!
I am facing the same issue so would like to bbump it. Recently, DBT labs support team suggested 2 solutions to me:
- --full-refresh whenever an upstream model is updated --> This works
- on_configuration_change = apply in dbt_project --> This does not work.
I noticed that the issue is still open, so don't know if the 2nd step is doable in any way?
Hi folks - I wanted to give an update to this issue. There has been some great feedback raised on this issue, and I appreciate the time everyone has taken here. As of now, we are unable to prioritize this work for our roadmap. However I am going to leave this issue open so that we may continue to have a space to have this discussion as well as continue to collect feedback.