dbt-core
dbt-core copied to clipboard
[CT-531] [Bug] More robust error messaging when keys are misspelled in yml files
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
A PR jobs started failing with no clear error message available to the end user. The only error surfaced was:
Error message:
Running with dbt=0.21.0
* Deprecation Warning: The “packages” argument of adapter.dispatch() has been
deprecated. Use the “macro_namespace” argument instead.
Raised during dispatch for: type_string
For more information, see:
https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch
Found 347 models, 448 tests, 0 snapshots, 0 analyses, 701 macros, 2 operations, 21 seed files, 302 sources, 0 exposures
Found a seed (<removed>) >1MB in size at the same path, dbt cannot tell if it has changed: assuming they are the same
upon inspection of debug logs, the following log line was identified:
2021-11-04 08:50:01.342766 (MainThread): None
and upon inspection of more robust logging available to dbt cloud support, this stack trace was identified:
Traceback (most recent call last):
File "/usr/src/app/sinter/clients/dbt.py", line 1147, in call
result, success = dbt_main.handle_and_check(
File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 205, in handle_and_check
task, res = run_from_args(parsed)
File "/usr/local/lib/python3.8/dist-packages/dbt/main.py", line 258, in run_from_args
results = task.run()
File "/usr/local/lib/python3.8/dist-packages/dbt/task/runnable.py", line 433, in run
self._runtime_initialize()
File "/usr/local/lib/python3.8/dist-packages/dbt/task/runnable.py", line 155, in _runtime_initialize
self.job_queue = self.get_graph_queue()
File "/usr/local/lib/python3.8/dist-packages/dbt/task/runnable.py", line 146, in get_graph_queue
return selector.get_graph_queue(spec)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 297, in get_graph_queue
selected_nodes = self.get_selected(spec)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 278, in get_selected
selected_nodes, indirect_only = self.select_nodes(spec)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 178, in select_nodes
direct_nodes, indirect_nodes = self.select_nodes_recursively(spec)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 149, in select_nodes_recursively
bundles = [
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 150, in <listcomp>
self.select_nodes_recursively(component)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 149, in select_nodes_recursively
bundles = [
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 150, in <listcomp>
self.select_nodes_recursively(component)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 149, in select_nodes_recursively
bundles = [
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 150, in <listcomp>
self.select_nodes_recursively(component)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 147, in select_nodes_recursively
direct_nodes, indirect_nodes = self.get_nodes_from_criteria(spec)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 104, in get_nodes_from_criteria
collected = self.select_included(nodes, spec)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector.py", line 89, in select_included
return set(method.search(included_nodes, spec.value))
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector_methods.py", line 514, in search
if checker(previous_node, real_node):
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector_methods.py", line 439, in check_modified
upstream_macro_change = self.check_macros_modified(new)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector_methods.py", line 435, in check_macros_modified
return self.recursively_check_macros_modified(node, previous_macros)
File "/usr/local/lib/python3.8/dist-packages/dbt/graph/selector_methods.py", line 419, in recursively_check_macros_modified
macro_node = self.manifest.macros[macro_uid]
KeyError: None
From here, we compared git changes between this PR job and the previously successful PR job, and were able to identify 2 new sources, 3 new models, 3 new tests.
From inspection of these changes, it was identified that the source of the issue was misspellings in the new schema.yml files as follows:
- not_null tests had the value “not null”
- description was named “desciption"
- columns was named “colums"
Which are misspelled invocations of a built in test, and 2 common yml keys. dbt Core should be able to surface misspellings like this more obviously to the end user
Expected Behavior
Helpful error messaging is surfaced to the end user when there are misspelling keys in the yml file
Steps To Reproduce
- Misspell a key in a schema.yml file such as:
- not_null tests had the value “not null”
- description was named “desciption"
- columns was named “colums"
- save changes and do a 'dbt run'
- observe the opaque error messaging available
Relevant log output
No response
Environment
dbt Cloud
- dbt:0.21.0
What database are you using dbt with?
bigquery
Additional Context
No response
@saraleon1 Thanks for opening!
If the only error had been defining the test as not null
instead of not_null
, we would have raised a more helpful error message (still not amazing, but at least better):
Compilation Error in test not_model_a_id (models/schema.yml)
'test_not' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".
The misspelled desciption
and colums
are more insidious. It's pretty surprising that this doesn't raise a validation error:
version: 2
models:
- name: model_a
random_key: does nothing
@gshank I know we do basic validation for the models
block when reading files, to make sure we're not setting parsing up for horrible failure later on. Do you have a sense of where we could validate to make sure that the keys in a node update block match our expected/supported keys? It's not like random_key
is doing anything above; it doesn't appear as a node property in the manifest.
I haven't looked into this code extensively, but my sense is that we do one of the following:
- Change a low-level class to no longer be extensible / supporting additional properties
- Since unrecognized fields get popped into the
_extra
dict, raise a parsing error somewhere here? if there are unrecognized top-level keys:
if block.target._extra:
raise ParsingException(f"Extra keys not allowed! {block.target._extra}")
$ dbt --no-partial-parse test
16:35:29 | [ info ] | Running with dbt=1.0.0-rc1
16:35:29 | [ error ] | Encountered an error:
Parsing Error
Extra keys not allowed! {'random_key': 'does nothing'}
I do think this could be a potential breaking change, in the sense of raising an error for existing projects that have cruft or misspelled properties hanging around. This feels like a great thing to sneak in for v1.0.0 if it would be simple, or a later minor version if it would be a much larger lift.
There's some minimal validation done in 'get_key_dicts', but we don't actually validate that there are only valid keys in the dictionaries anywhere; we basically set the values into the attributes when constructing objects, so we just ignore invalid keys. Better validation seems like a good idea. I'm not exactly sure what the best place and way to do it is. If we validate when everything is a dictionary, then we have to update the valid keys when a config or attribute is added, and it would have to be specific to the patch type. But validating as an object isn't going to work because we can't set attributes that don't exist. I think the best approach would be to have validation code and separate validation data for each entry type, and validate somewhere around the get_key_dicts, but with separate validation methods for each schema subparser.
Hey team, I've got a relevant one for you... if you have a malformed dbt_project.yml
:
name: "my_dbt_project"
version: "1.0.0"
config-version: 2
profile: "snowflake"
model-paths: ["models"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
target-path: "target"
clean-targets:
- "target"
- "dbt_modules"
- "dbt_packages"
# prettier-ignore
models:
+materialized:table
my_dbt_project:
A run in dbt CLI provides some helpful hints:
$ dbt run
10:00:23 Encountered an error:
Runtime Error
Syntax error near line 20
------------------------------
17 | # prettier-ignore
18 | models:
19 | +materialized:table
20 | my_dbt_project:
Raw Error:
------------------------------
mapping values are not allowed in this context
in "<unicode string>", line 20, column 19
But because no dbt logs are generated from the Runtime Error - we get this when looking at a run history in dbt Cloud:
You have no idea what's going on.
I suppose my questions are:
- Is this pertinent to this ticket (misspellings in yaml) or is this its own issue (malformed yaml causing runtime errors generating no logs)?
- Is this a dbt Cloud thing (because currently run history logs are effectively "dbt generated logs" only and not things outside of that realm - Python stack trace) or a dbt-core thing (because runtime errors generate no "dbt generated logs")? At some level I suppose it's both.
Thanks :)
@jeremyyeo I think the resolution for that discrepancy would be the proposal in #4836 — i.e. every exception that dbt-core
throws should always also be a log message.
(@nathaniel-may Does that sound right to you, or am I misunderstanding that proposal?)
@jtcohen6 Yes, I imagine we would want to log errors before they are thrown whenever possible. I'm imagining having a single exception event type that wraps the parent exception to achieve this with minimal code.
estimation
- option 1: we add some simple validation. ~3
- option 2: we investigate making a more holistic design choice that is more future proof for expected keys. ~2 day spike to start.
I would also like to add on the Key Validation front, that optional keys like "description" quietly fail completely if they are misspelled. No error or warning, just no results in the docs.
I spent about 20 minutes trying to see that I was missing the i in tion (descripton) and that's why it wasn't appearing in the docs, but running just fine. Maybe a warning for keys that are optional but not recognized?
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
I was about to remove the stale
label and add paper_cut
, but I see that we already have a newer "paper cut" issue that's basically the same as this one: https://github.com/dbt-labs/dbt-core/issues/5605
I'm going to close this older one as a duplicate of that newer one. This is still very much a UX improvement that's on my mind!
:wave: I created a PR with tests to demonstrate the issue.