Format change in query of parent causes child model to be classified as indirectly modified
Problem
Changed formatting of the query of parent model causes child model to be classified as indirectly modified. Depending on the model kind this may cause unnecessary backfilling of the child.
Instead the expected behavior would be that a metadata change of the parent model wouldn't lead to the child model to be classified as indirectly modified.
sqlmesh==0.227.1
Reproducible example
Using the demo example project sqlmesh_example created with sqlmesh init.
Setup: Execute sqlmesh plan to create the prod environment
Change model DDL
"Obvious" metadata changes such as adding description and audit to the model DDL is classified as expected.
Making the following change to sqlmesh_example.incremental_model:
MODEL (
name sqlmesh_example.incremental_model,
kind INCREMENTAL_BY_TIME_RANGE (
time_column event_date
),
start '2020-01-01',
cron '@daily',
grain (id, event_date),
description "My incremental model",
audits NOT_NULL(columns:=(id)),
);
SELECT
id,
item_id,
event_date,
FROM
sqlmesh_example.seed_model
WHERE
event_date BETWEEN @start_date AND @end_date
Doing a plan:
sqlmesh plan
Terminal log:
.
======================================================================
Successfully Ran 1 tests against duckdb in 0.05 seconds.
----------------------------------------------------------------------
Differences from the `prod` environment:
Models:
└── Metadata Updated:
└── sqlmesh_example.incremental_model
---
+++
@@ -10,6 +10,9 @@
on_destructive_change 'ERROR',
on_additive_change 'ALLOW'
),
+ audits (NOT_NULL('columns' = (
+ id
+ ))),
grains ((id, event_date))
)
SELECT
Metadata Updated: sqlmesh_example.incremental_model
Apply - Virtual Update [y/n]: y
SKIP: No physical layer updates to perform
[1/1] sqlmesh_example.incremental_model [insert 2020-01-01 - 2025-11-02, audits ✔1] 0.01s
Auditing models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 1/1 • 0:00:00
✔ Model batches executed
SKIP: No model batches to execute
Updating virtual layer ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2/2 • 0:00:00
✔ Virtual layer updated
Changing query format
Adding some innocent whitespace to the query between id and the comma is classified as a metadata change and the sqlmesh_example.full_model is classified as indirectly modified:
MODEL (
name sqlmesh_example.incremental_model,
kind INCREMENTAL_BY_TIME_RANGE (
time_column event_date
),
start '2020-01-01',
cron '@daily',
grain (id, event_date),
description "My incremental model",
audits NOT_NULL(columns:=(id)),
);
SELECT
id ,
item_id,
event_date,
FROM
sqlmesh_example.seed_model
WHERE
event_date BETWEEN @start_date AND @end_date
Doing sqlmesh plan results in the following terminal log:
.
======================================================================
Successfully Ran 1 tests against duckdb in 0.05 seconds.
----------------------------------------------------------------------
Differences from the `prod` environment:
Models:
├── Indirectly Modified:
│ └── sqlmesh_example.full_model
└── Metadata Updated:
└── sqlmesh_example.incremental_model
Metadata Updated: sqlmesh_example.incremental_model
Apply - Virtual Update [y/n]: y
SKIP: No physical layer updates to perform
SKIP: No model batches to execute
Updating virtual layer ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2/2 • 0:00:00
✔ Virtual layer updated
We suspect this is a bug and that it was introduced in https://github.com/TobikoData/sqlmesh/pull/5256, particularly that is_indirectly_changed() should take into account if changes in the parent data hash are in fact metadata changes only.
@izeigerman