dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-981] Raise clear warning/error if yaml property key is misnamed

Open albertovpd opened this issue 2 years ago • 2 comments

Is this a new bug in dbt-core?

  • [X] I believe this is a new bug in dbt-core
  • [X] I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I am testing a model which have 2 tests for a column. If tests are called under a field named test and not testS, it runs satisfactorily just the 1st of the tests, without warning/logging errors about that the 2nd test did not run.

image

Expected Behavior

To have an error output like the following: Found more than 1 test under the test field. Please reference the structure as "tests"

Steps To Reproduce

  1. create a custom test
  2. apply that custom test and a basic one to a column of a model

Relevant log output

The problem is that there is no output warning about just the first test ran

Environment

- OS: WSL for windows (so we should say ubuntu)
- Python: 3.8.2
- dbt: 1.1.0

Which database adapter are you using with dbt?

bigquery

Additional Context

image

I created this custom test and regardless the name of the column, it always passed. Then I realised that the test section in the yml file should be named tests, in plural. Then, it works.

albertovpd avatar Aug 03 '22 10:08 albertovpd

@albertovpd Thanks for opening!

I tried this out locally, and I found that test is just completely ignored. Only tests defined under the tests key (correctly named) were actually defined and run.

I think the deal here is that our dataclasses / validation for both model and columns support additional properties being defined:

models:
  - name: some_model
    invalid_key:
      - something
    columns:
      - name: id
        invalid_key:
          - something_else

It would be better if dbt raised a clear and explicit error message that invalid_key is not supported! At least a warning, surely. Especially now that we have dedicated properties to accept arbitrary, user-provided key-value pairs — namely config and meta.

jtcohen6 avatar Aug 09 '22 12:08 jtcohen6

You're right @jtcohen6 Also your proposed solution is great from the user point of view

albertovpd avatar Aug 09 '22 12:08 albertovpd

I just closed an older issue that's really the same as this one: https://github.com/dbt-labs/dbt-core/issues/4280

That one had several comments/upvotes. This is definitely a UX improvement that several people have asked for explicitly, and it would likely benefit many many more!

jtcohen6 avatar Feb 01 '23 10:02 jtcohen6

From @alittlesliceoftom here:

I was going to raise an issue: "Incorrect Configs Are Silently Ignored - they should raise errors"

But this seems like a close fit. Adding an example: {{ config( materialized = 'table', dist= 'REPLICATE', this_field_is_made_up = 'and_has_no_effect' ) }} Should fail, but in fact it just does nothing.

I think this should fail as it's easy for a developer to type "dist" or "distribution" and get different results. Or spell materialized as materialised.

It would be more developer friendly to let them know there's an error.

dbeatty10 avatar May 24 '23 13:05 dbeatty10

Another example from "the wild" (ie Community Slack thread)

user reported that the datawarehouse was not changed as defined in their config

{{
    config(
        materialized='table',
        schema='finance',
        snowflake_datawarehouse='DATAWAREHOUSE_MEDIUM'
    )
}}

the issue is they were using snowflake_datawarehouse instead of snowflake_warehouse

dataders avatar Sep 07 '23 13:09 dataders

Here is a further example of someone creating an invalid model config (they confused it with a source config) and it would have been helpful for this to throw a parsing error because their yaml is incorrectly formatted, rather than failing in a more obscure way. https://getdbt.slack.com/archives/CBSQTAPLG/p1702483400951449

mwstanleyft avatar Dec 15 '23 09:12 mwstanleyft

Thanks for linking that thread in Slack to this feature request @mwstanleyft !

This is a nice summary of the surprise factor:

Tbh I'm always unpleasantly surprised that dbt doesn't enforce a correct yaml schema and throw an error if the yaml is incorrect. You can get a lot of obscure errors simply because you've made a typo or something in the yaml that results in an invalid file, but dbt finds a way regardless

dbeatty10 avatar Dec 15 '23 16:12 dbeatty10

Acceptance criteria

  • Unrecognized config names raise a warning
  • Unrecognized property names raise a warning
    • In dbt, properties are declared in .yml files, in the same directory as your resources, and could be named schema.yml, _properties.yml, whatever_you_want.yml, etc.
  • document how to turn on a "strict mode" for unrecognized config/property names that converts relevant warnings to error messages

For refinement

  • there is a method to add certain names to an "allow list" that bypasses a warning message for those particular names
  • there is some kind of string distance metric (like Levenshtein distance, etc.) used to emit a warning like "Did you mean materialized instead of materialization?"
  • warnings vs. errors

We may not want to raise errors at this time since that would increase the risk of introducing breaking changes to projects that are currently working (even if they have been misconfigured in an invisible way). Raising a warning is a more gentle on-ramp than jumping all the way to raising an error. The trade-off is that warnings are less visible than errors and may be overlooked. If we choose to go the "warnings only" route, we can still reserve the right to elevate this all the way to an error at some point in the future.

dbeatty10 avatar Dec 15 '23 16:12 dbeatty10