dbt-project-evaluator icon indicating copy to clipboard operation
dbt-project-evaluator copied to clipboard

Snowflake extract function arguments are incorrectly flagged as hard coded references

Open deanmorin opened this issue 6 months ago • 1 comments

Describe the bug

The extract function is snowflake includes the keyword from. When the date_or_time_expr arg following this keyword is qualified with the table name and the closing parenthesis is on a new line, it is incorrectly flagged as a hardcoded reference.

Steps to reproduce

select
  1
  , extract(year from my_model.date_1)
  , extract(year from my_model.date_2
  )
  , extract(year from date_3
  )
from 
  {{ ref('my_model') }}

Only my_model.date_2 will be incorrectly flagged.

Expected results

No hard coded references found.

Actual results

my_model.date_2 is flagged as a hard coded reference.

Screenshots and log output

System information

The contents of your packages.yml file:

Which database are you using dbt with?

  • [ ] postgres
  • [ ] redshift
  • [ ] bigquery
  • [x] snowflake
  • [ ] trino/starburst
  • [ ] other (specify: ____________)

The output of dbt --version:

Core:
  - installed: 1.7.3
  - latest:    1.7.4 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - snowflake: 1.7.1 - Up to date!

Additional context

Are you interested in contributing the fix?

Unfortunately I do not have time.

deanmorin avatar Jan 02 '24 19:01 deanmorin

Explanation

This is getting caught by from_table_1 regex matching.

matches (from or join) followed by some spaces and then <something>.<something_else> 
where each <something> is enclosed by (` or [ or " or ' or nothing)

The reason my_model.date_2 is only being caught when the closing parenthesis is on a newline, is because the regex matching for from_table_1 ends with "followed by a whitespace character or end of string":

# eighth matching group
# 1 or 0 of (closing bracket, backtick, or quotation mark) followed by a whitespace character or end of string
([\]`\"\']?)(?=\s|$)

Troubleshooting

Did a bit of troubleshooting on this, and wanted to document a few solutions that don't work.

To reproduce this error, I updated the SQL in stg_model_4 to:

select 
    1 as id
    -- ,extract(year from stg_model_2.date_field
    -- ) as year
-- from {{ ref('stg_model_2') }}
  1. I tried to adjust the end of from_table_1 to instead be "not followed by a period"
# eighth matching group
# 1 or 0 of (closing bracket, backtick, or quotation mark) followed by a whitespace character or end of string
([\]`\"\']?)(?!\.)

This didn't work because then, things like "my_db"."my_schema"."my_table" is being incorrectly flagged as "my_db"."my_schema.

  1. I tried to use a negative look behind function to say "exclude any statements following extract(...
(?<!extract\([^\)]*)

This didn't work because look behinds need to have a fixed width. Screenshot 2024-01-04 at 2 15 55 PM

I will keep thinking on this, but wanted to document some initial challenges. Open to other ideas for how we can prevent this from being incorrectly flagged!

graciegoheen avatar Jan 04 '24 21:01 graciegoheen