dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-3229] Materialized views are not recreated after configuration change

Open aBBDnGus opened this issue 2 years ago • 8 comments

Is this a new bug in dbt-core?

  • [X] I believe this is a new bug in dbt-core
  • [X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When the model definition of a materialized view is changed and then dbt run is executed, the materialized view is refreshed, but the changes are not applied. The changes are applied only with dbt run --full-refresh. This behaviour is independent from the parameter on_configuration_change.

Expected Behavior

  • If dbt run is executed without changing the model, the materialized view should be refreshed (works)
  • If dbt run is executed with changes, the materialized view should be recreated (does not work: just a refresh happens)
  • If dbt run --full-refresh is executed, the materialized view should always be recreated (works)

Steps To Reproduce

  1. Create file test.sql
{{ config(materialized='materialized_view') }}
SELECT 1
  1. Execute dbt run
  2. The materialized view is created, log output: create materialized view if not exists "database"."schema"."test" as SELECT 1;
  3. Change test.sql
{{ config(materialized='materialized_view') }}
SELECT 1, 2
  1. Execute dbt run
  2. The new definition is not applied, log output: refresh materialized view "database"."schema"."test"

Relevant log output

No response

Environment

- OS: Windows
- Python: 3.11.5
- dbt-core: 1.6.6

Which database adapter are you using with dbt?

postgres

Additional Context

No response

aBBDnGus avatar Oct 16 '23 11:10 aBBDnGus

fwiw i'm also experiencing this, i did mistakenly raise an issue here: https://github.com/dbt-labs/dbt-redshift/issues/638 which i've now closed as a dupe of this. Linking here as it may add some more information

SpiceyC avatar Oct 21 '23 11:10 SpiceyC

I'll start by saying you're correct, this definitely doesn't feel intuitive and very likely needs to be addressed. But I'm also going to argue semantics. The change highlighted in this issue is not a configuration change, but rather a change to the sql itself. I think of configuration changes as the contents of the config() block. Tracking changes to the sql is easier in some platforms and more difficult in others; in particular, some platforms will save the sql as it was written, and some will save a rewritten/optimized version of the sql. It's potentially a difficult thing to generalize.

So I'm going to offer a rephrasing of this issue for folks to discuss:

Should we monitor changes to the sql within a model? If so, do we work that change into the materialization as another condition for logic flow (it would likely need to be a full refresh)? If not, how are we clear to the users that we are not monitoring for sql changes and --full-refresh will need to be explicitly provided?

Cc: @Fleid @graciegoheen @dataders @dbeatty10 for awareness

mikealfare avatar Oct 23 '23 22:10 mikealfare

Refinement

@Fleid curious to hear if you expected a change in the SQL logic for a materialized view to trigger a data and/or database update when on_configuration_change="apply"?

If changing the SQL logic isn't taken into account, then I think that would be an outlier compared to our other materialization types.

Looking more closely at materialized types

Table, view, & ephemeral

When materialized is table, view, or ephemeral, and the SQL logic changes, it will update any underlying database object and change the resulting data when queried.

Incremental

The only other standard materialization type other than materialized view that needs similar "on change" logic is the incremental materialization. Even with the default of on_schema_change="ignore", the new SQL logic will also be taken into account, it just won't add/remove columns or change their data types.

dbeatty10 avatar Oct 23 '23 23:10 dbeatty10

Is that what I want? Yes. Is that what's in the spec? No.

Why is because there is no way for us to determine if the SQL has changed:

  • We can only compare the desired state (model logic) to the previously applied state (current logic existing in the database): one of our primitives is that dbt core doesn't have access to the previously desired state (model logic from a previous dbt run)
  • And sadly, most warehouses don't allow us to extract the applied state (as in BigQuery please give me the DDL of that materialized view). Quite frankly, even if they did, it would be a nightmare to compare these 2 "strings"

What we can do instead is add on_schema_change, the same way we have for incremental models. Note that on_configuration_change is inspired by on_schema_change, but it's not random that it's not the same name (+1 @mikealfare on the semantics here) ;) Now we would not be tracking the logic anymore, but the signature (column names and types) of the model. That's do-able.

Fleid avatar Oct 25 '23 17:10 Fleid

Was going to bump this as I am experiencing a similar roadblock.

I am considering converting my materialized views to tables to mitigate this, and would rather not do this. I like the changes that @Fleid is suggesting, that seems to make sense.

I have also considered creating my own custom materialization to address this.

Just wanted to chime in and say that this sort of thing would be hugely beneficial.

Dbt is great! 🔥 🔥 Thank you all for your work.

jakedmarcotte avatar Feb 16 '24 23:02 jakedmarcotte

Hello Everyone!

I am facing the same issue so would like to bbump it. Recently, DBT labs support team suggested 2 solutions to me:

  1. --full-refresh whenever an upstream model is updated --> This works
  2. on_configuration_change = apply in dbt_project --> This does not work.

I noticed that the issue is still open, so don't know if the 2nd step is doable in any way?

jazib-rasheed17 avatar Jul 17 '24 17:07 jazib-rasheed17

Hi folks - I wanted to give an update to this issue. There has been some great feedback raised on this issue, and I appreciate the time everyone has taken here. As of now, we are unable to prioritize this work for our roadmap. However I am going to leave this issue open so that we may continue to have a space to have this discussion as well as continue to collect feedback.

amychen1776 avatar Jul 23 '24 17:07 amychen1776