dbt-databricks
dbt-databricks copied to clipboard
replace_where fails when jinja statement is used in both predicate and select
Describe the bug
If an incremental model using the replace_where strategy uses a jinja-statement in the incremental predicate and in the select then a Runtime Error is raised if the jinja-statement value has changed between two runs of the model and nothing has changed in the model or any macros called by the model.
We ran into this when we created a macro that returned the previous month and used that in the incremental predicate. When the month changed we got a Runtime Error when we tried to run the models. It seems like the previous month was used in the INSERT-part of the SQL-statement while the current month was used in the SELECT-part of the SQL-statement.
This seems, to someone who doesn't know the internals of dbt, to be caused by different caching happening in the SELECT part of the model and the INSERT part of the model, since when I look at the code in target/compiled
the new value of the jinja-statement is used, but in the INSERT-statement in target/run
the old value is used.
Steps To Reproduce
Create a model with the following contents, I called it test_model.sql:
{%- set predicate = run_started_at -%}
{% set INCREMENTAL_PREDICATE %}
my_column = '{{ predicate }}'
{%- endset %}
{{
config(
materialized='incremental',
incremental_strategy='replace_where',
incremental_predicates=INCREMENTAL_PREDICATE,
alias='test_model'
)
}}
select '{{ predicate }}' as my_column
First run the model once to do the initial load, e.g. dbt run -s test_model
. Then run it two more times normally, i.e. without full-refresh. Observe that the final run fails with a Runtime Error. By looking at the compiled code in target/run/.../test_model.sql
we can see that the timestamp in the insert part of the code is the timestamp of the first incremental run while if we look at the compiled code in target/compiled/.../test_model.sql
the timestamp in the select part of the code is the timestamp of the second incremental run.
Expected behavior
I would expect the the timestamp of the second incremental run would be used in both the INSERT- and SELECT-part of the model.
Screenshots and log output
Output from the failed run:
> dbt run -s test_model (dbt-17) ─╯
14:16:32 Running with dbt=1.7.4
14:16:33 Registered adapter: databricks=1.7.3
14:16:35 Found 491 models, 911 tests, 443 sources, 0 exposures, 0 metrics, 995 macros, 0 groups, 0 semantic models
14:16:35
14:17:02 Concurrency: 9 threads (target='test01')
14:17:02
14:17:02 1 of 1 START sql incremental model stg_gza.test_model .......................... [RUN]
14:17:04 1 of 1 ERROR creating sql incremental model stg_gza.test_model ................. [ERROR in 2.03s]
14:17:05
14:17:05 Finished running 1 incremental model in 0 hours 0 minutes and 30.80 seconds (30.80s).
14:17:06
14:17:06 Completed with 1 error and 0 warnings:
14:17:06
14:17:06 Runtime Error in model test_model (models/test_model.sql)
Data written out does not match replaceWhere '(my_column = '2023-12-19 14:15:36.319930+00:00')'.
CHECK constraint EXPRESSION(('my_column = 2023-12-19 14:15:36.319930+00:00)) (my_column = '2023-12-19 14:15:36.319930+00:00') violated by row with values:
- my_column : 2023-12-19 14:16:32.485265+00:00
14:17:06
14:17:06 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
System information
The output of dbt --version
:
> dbt --version
Core:
- installed: 1.7.4
- latest: 1.7.4 - Up to date!
Plugins:
- databricks: 1.7.3 - Up to date!
- spark: 1.7.1 - Up to date!
The operating system you're using: macOS version 13.6.3, but also reproduced on Windows 11 and on a Linux server (I'm unsure of the exact distribution).
The output of python --version
: Python 3.10.13
Additional context
Add any other context about the problem here.
Thanks for reporting. Does anything jump out if you look at the compiled version of the sql? Jinja is crazy...