dbt-project-evaluator
dbt-project-evaluator copied to clipboard
Snowflake extract function arguments are incorrectly flagged as hard coded references
Describe the bug
The extract
function is snowflake includes the keyword from
. When the date_or_time_expr
arg following this keyword is qualified with the table name and the closing parenthesis is on a new line, it is incorrectly flagged as a hardcoded reference.
Steps to reproduce
select
1
, extract(year from my_model.date_1)
, extract(year from my_model.date_2
)
, extract(year from date_3
)
from
{{ ref('my_model') }}
Only my_model.date_2
will be incorrectly flagged.
Expected results
No hard coded references found.
Actual results
my_model.date_2
is flagged as a hard coded reference.
Screenshots and log output
System information
The contents of your packages.yml
file:
Which database are you using dbt with?
- [ ] postgres
- [ ] redshift
- [ ] bigquery
- [x] snowflake
- [ ] trino/starburst
- [ ] other (specify: ____________)
The output of dbt --version
:
Core:
- installed: 1.7.3
- latest: 1.7.4 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- snowflake: 1.7.1 - Up to date!
Additional context
Are you interested in contributing the fix?
Unfortunately I do not have time.
Explanation
This is getting caught by from_table_1 regex matching.
matches (from or join) followed by some spaces and then <something>.<something_else>
where each <something> is enclosed by (` or [ or " or ' or nothing)
The reason my_model.date_2
is only being caught when the closing parenthesis is on a newline, is because the regex matching for from_table_1
ends with "followed by a whitespace character or end of string":
# eighth matching group
# 1 or 0 of (closing bracket, backtick, or quotation mark) followed by a whitespace character or end of string
([\]`\"\']?)(?=\s|$)
Troubleshooting
Did a bit of troubleshooting on this, and wanted to document a few solutions that don't work.
To reproduce this error, I updated the SQL in stg_model_4
to:
select
1 as id
-- ,extract(year from stg_model_2.date_field
-- ) as year
-- from {{ ref('stg_model_2') }}
- I tried to adjust the end of
from_table_1
to instead be "not followed by a period"
# eighth matching group
# 1 or 0 of (closing bracket, backtick, or quotation mark) followed by a whitespace character or end of string
([\]`\"\']?)(?!\.)
This didn't work because then, things like "my_db"."my_schema"."my_table"
is being incorrectly flagged as "my_db"."my_schema
.
- I tried to use a negative look behind function to say "exclude any statements following
extract(...
(?<!extract\([^\)]*)
This didn't work because look behinds need to have a fixed width.
I will keep thinking on this, but wanted to document some initial challenges. Open to other ideas for how we can prevent this from being incorrectly flagged!