dbt-core Fix filter parsing bug

resolves #9507

Problem

Currently, if you pass a string into a filter YAML param, it will be assumed to be a list. This means your semantic manifest ends up with a list of strings for where filters. For example, this YAML:

filter: |
    {{ Dimension('customer__customer_type') }}  = 'new'

would look like this in the semantic manifest:

{"where_filters": [{"where_sql_template": "{"}, {"where_sql_template": "{"}, {"where_sql_template": " "}, {"where_sql_template": "D"}, {"where_sql_template": "i"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "n"}, {"where_sql_template": "s"}, {"where_sql_template": "i"}, {"where_sql_template": "o"}, {"where_sql_template": "n"}, {"where_sql_template": "("}, {"where_sql_template": "'"}, {"where_sql_template": "c"}, {"where_sql_template": "u"}, {"where_sql_template": "s"}, {"where_sql_template": "t"}, {"where_sql_template": "o"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "r"}, {"where_sql_template": "_"}, {"where_sql_template": "_"}, {"where_sql_template": "c"}, {"where_sql_template": "u"}, {"where_sql_template": "s"}, {"where_sql_template": "t"}, {"where_sql_template": "o"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "r"}, {"where_sql_template": "_"}, {"where_sql_template": "t"}, {"where_sql_template": "y"}, {"where_sql_template": "p"}, {"where_sql_template": "e"}, {"where_sql_template": "'"}, {"where_sql_template": ")"}, {"where_sql_template": " "}, {"where_sql_template": "}"}, {"where_sql_template": "}"}, {"where_sql_template": " "}, {"where_sql_template": " "}, {"where_sql_template": "="}, {"where_sql_template": " "}, {"where_sql_template": "'"}, {"where_sql_template": "n"}, {"where_sql_template": "e"}, {"where_sql_template": "w"}, {"where_sql_template": "'"}]}

This is because the dataclass from_dict() method only works for unions if Union is the outermost type annotation. We had it nested in an Optional annotation, so this fixes that.

Solution

Checklist

[x] I have read the contributing guide and understand what's expected of me
[x] I have run this code in development and it appears to resolve the stated issue
[x] This PR includes tests, or tests are not required/relevant for this PR
[x] This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
[x] This PR includes type annotations for new and modified functions

Feb 02 '24 00:02 courtneyholcomb

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Feb 02 '24 00:02 github-actions[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (2411f93) 87.97% compared to head (50fc004) 87.93%. Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #9508      +/-   ##
==========================================
- Coverage   87.97%   87.93%   -0.05%     
==========================================
  Files         167      167              
  Lines       22171    22171              
==========================================
- Hits        19506    19496      -10     
- Misses       2665     2675      +10

Flag	Coverage Δ
integration	`85.51% <100.00%> (-0.12%)`	:arrow_down:
unit	`61.90% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Feb 02 '24 00:02 codecov[bot]

based on my minuscule knowledge of Mashumaro (where the from_dict method apparently comes from)

@tlento I thought that too, but if you go into the definition in Mashumaro, it's just the type stubs. Took me forever to track this down as a result 😅 but turns out from_dict() is just coming from dataclass.

Feb 02 '24 17:02 courtneyholcomb

@tlento I thought that too, but if you go into the definition in Mashumaro, it's just the type stubs. Took me forever to track this down as a result 😅 but turns out from_dict() is just coming from dataclass.

Well I just learned something today. There is no from_dict method on a Python dataclass, which is why I was so confused at first, and that type stub is indeed non-functional as written, but I figured there was something that Mashumaro was doing to populate the body with some type-specific extraction.

Turns out the library adds it on the fly via a dynamic codegen + compile step invoked from the DataclassDictMixin.

Consequently, I have no idea what the resulting from_dict code is actually doing or why this was an issue or why or how this PR fixes it, but I 100% trust that this was the issue because I can easily imagine how it might happen.

Were you able to actually get to the from_dict method body in your investigation? I'd be curious to see what it looks like, since my feeble human brain is not able to assemble the recursively-generated code.

Wild stuff, either way.

Feb 03 '24 01:02 tlento

Consequently, I have no idea what the resulting from_dict code is actually doing

You can see the generated code by enabling debug in the config parameter as described in the docs.

why this was an issue

Currently, Union types are handled naively by trying to use a handler for each variant type in the loop. Values of typestr are handled as is by default and that can be an issue in Union. The most reliable solution in this case would be to register a custom deserialization method. For example, if str is the first variant on the list, then using a simple isinstance check as in this example can help. Other options may include a custom method for list[str] or the whole union. You can play with debug mode to see which method is the best for you. I understand that this imperfection might be confusing and dangerous in some cases, so I have plans to redo the processing of unions in next versions.

If you still have any questions or need help, I invite you to open an issue for discussion.

Feb 03 '24 05:02 Fatal1ty

Added tests! @QMalcolm thank you for pointing me to where those should be!

Note that in the process of writing tests, I discovered a separate bug. If you try to use a list filter that includes jinja on a metric or an input measure, you'll get an error when you run dbt parse. For example, parsing the following YAML:

metrics:
  - name: collective_tenure_measure_filter_list
    label: "Collective tenure2"
    description: Total number of years of team experience
    type: simple
    type_params:
      measure:
        name: "years_tenure"
        filter:
          - "{{ Dimension('id__loves_dbt') }} is true"

Gets this error: CompilationError("Could not render {{ Dimension('id__loves_dbt') }} is true: 'Dimension' is undefined") The exact same YAML works if you include the filter as a string.

I'll be putting up a separate issue for that bug. But that's the reason you don't see any test cases for input measures or metrics with list filters. The same bug does not exist for filters on saved queries, oddly, so I was able to include list filter tests for those.

Feb 13 '24 19:02 courtneyholcomb

I'm not sure why the artifacts check is failing in CI - it looks like the action itself is erroring? But LMK if there's something I need to fix there!

Feb 13 '24 19:02 courtneyholcomb

I'm not sure why the artifacts check is failing in CI - it looks like the action itself is erroring? But LMK if there's something I need to fix there!

The failing CI check is one we've added recently. I've added the label artifact_minor_upgrade which essentially skips the check.

Context: we want to strongly control changes to /artifacts to help identify and mitigate breaking changes. To do this we added a CI step which checks for any file changes in /artifacts. Once we someone verifies the change isn't breaking, we add the artifact_minor_upgrade label to acknowledge this. We could implement a smarter check which recognizes what is and isn't a breaking change, and maybe we will in the future, but this gets the job done for now.

Feb 17 '24 12:02 QMalcolm