dbt-core
dbt-core copied to clipboard
[CT-981] Raise clear warning/error if yaml property key is misnamed
Is this a new bug in dbt-core?
- [X] I believe this is a new bug in dbt-core
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I am testing a model which have 2 tests for a column. If tests are called under a field named test and not testS, it runs satisfactorily just the 1st of the tests, without warning/logging errors about that the 2nd test did not run.
Expected Behavior
To have an error output like the following: Found more than 1 test under the test field. Please reference the structure as "tests"
Steps To Reproduce
- create a custom test
- apply that custom test and a basic one to a column of a model
Relevant log output
The problem is that there is no output warning about just the first test ran
Environment
- OS: WSL for windows (so we should say ubuntu)
- Python: 3.8.2
- dbt: 1.1.0
Which database adapter are you using with dbt?
bigquery
Additional Context
I created this custom test and regardless the name of the column, it always passed. Then I realised that the test section in the yml file should be named tests, in plural. Then, it works.
@albertovpd Thanks for opening!
I tried this out locally, and I found that test
is just completely ignored. Only tests defined under the tests
key (correctly named) were actually defined and run.
I think the deal here is that our dataclasses / validation for both model
and columns
support additional properties being defined:
models:
- name: some_model
invalid_key:
- something
columns:
- name: id
invalid_key:
- something_else
It would be better if dbt raised a clear and explicit error message that invalid_key
is not supported! At least a warning, surely. Especially now that we have dedicated properties to accept arbitrary, user-provided key-value pairs — namely config
and meta
.
You're right @jtcohen6 Also your proposed solution is great from the user point of view
I just closed an older issue that's really the same as this one: https://github.com/dbt-labs/dbt-core/issues/4280
That one had several comments/upvotes. This is definitely a UX improvement that several people have asked for explicitly, and it would likely benefit many many more!
From @alittlesliceoftom here:
I was going to raise an issue: "Incorrect Configs Are Silently Ignored - they should raise errors"
But this seems like a close fit. Adding an example:
{{ config( materialized = 'table', dist= 'REPLICATE', this_field_is_made_up = 'and_has_no_effect' ) }}
Should fail, but in fact it just does nothing.I think this should fail as it's easy for a developer to type "dist" or "distribution" and get different results. Or spell materialized as materialised.
It would be more developer friendly to let them know there's an error.
Another example from "the wild" (ie Community Slack thread)
user reported that the datawarehouse was not changed as defined in their config
{{
config(
materialized='table',
schema='finance',
snowflake_datawarehouse='DATAWAREHOUSE_MEDIUM'
)
}}
the issue is they were using snowflake_datawarehouse
instead of snowflake_warehouse
Here is a further example of someone creating an invalid model config (they confused it with a source config) and it would have been helpful for this to throw a parsing error because their yaml is incorrectly formatted, rather than failing in a more obscure way. https://getdbt.slack.com/archives/CBSQTAPLG/p1702483400951449
Thanks for linking that thread in Slack to this feature request @mwstanleyft !
This is a nice summary of the surprise factor:
Tbh I'm always unpleasantly surprised that dbt doesn't enforce a correct yaml schema and throw an error if the yaml is incorrect. You can get a lot of obscure errors simply because you've made a typo or something in the yaml that results in an invalid file, but dbt finds a way regardless
Acceptance criteria
- Unrecognized config names raise a warning
- Using a
config()
Jinja macro within amodel
,snapshot
,test
, etc. SQL file - Using a
config
property in a.yml
file - From the
dbt_project.yml
file, under the corresponding resource key (models:
,snapshots:
,tests:
, etc)
- Using a
- Unrecognized property names raise a warning
- In dbt, properties are declared in
.yml
files, in the same directory as your resources, and could be namedschema.yml
,_properties.yml
,whatever_you_want.yml
, etc.
- In dbt, properties are declared in
- document how to turn on a "strict mode" for unrecognized config/property names that converts relevant warnings to error messages
For refinement
- there is a method to add certain names to an "allow list" that bypasses a warning message for those particular names
- there is some kind of string distance metric (like Levenshtein distance, etc.) used to emit a warning like "Did you mean
materialized
instead ofmaterialization
?" - warnings vs. errors
We may not want to raise errors at this time since that would increase the risk of introducing breaking changes to projects that are currently working (even if they have been misconfigured in an invisible way). Raising a warning is a more gentle on-ramp than jumping all the way to raising an error. The trade-off is that warnings are less visible than errors and may be overlooked. If we choose to go the "warnings only" route, we can still reserve the right to elevate this all the way to an error at some point in the future.