[CT-1146] [Bug] providers.py raises unhelpful error if dispatch provides package name that doesn't exist
Is this a new bug in dbt-core?
- [X] I believe this is a new bug in dbt-core
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I want to preface this by saying I am not surprised that no one has run into this bug before. It takes a special kind of person to make changes to every macro reference in their package and then forget to update the package name in the dispatch line. That being said, here is the bug.
If the adapter.dispatch provides the name of a package that doesn't exist, the error handling build into providers.py doesn't account for this pattern.
Example: The namespace provided metrics exists so it doesn't go into the first path. In the second path, it looks for the namespace in get_macro_search_order but that can't complete because the namespace doesn't exist in the dbt_projects path. The end result is returning a None object.
if namespace is None:
search_packages = [None]
elif isinstance(namespace, str):
search_packages = self._adapter.config.get_macro_search_order(namespace)
if not search_packages and namespace in self._adapter.config.dependencies:
search_packages = [self.config.project_name, namespace]
This None object then can't be iterated in line 167:
for package_name in search_packages:
This returns an unhelpful error message shown in relevant log output.
Potential Fix?
@dbeatty10 and I were able to solve this problem by adding another path after line 154:
if not search_packages and namespace in self._adapter.config.dependencies:
search_packages = [self.config.project_name, namespace]
else:
search_packages = [None]
This allows the search_packages variable to be iterated over with and helps raise a much more helpful error message around what is going wrong.
Expected Behavior
This error message
dbt.exceptions.CompilationException: Compilation Error in model develop_metric (models/metric_testing_models/develop_metric.sql)
In dispatch: No macro named 'develop' found
Searched for: 'postgres__develop', 'default__develop'
> in macro develop (macros/develop.sql)
> called by model develop_metric (models/metric_testing_models/develop_metric.sql)
Steps To Reproduce
Loom explaining and showing issue: https://www.loom.com/share/347a5dc783cc45f2be5c8d7eede01622
Relevant log output
File "/Users/callummccann/repos/personal_dotfiles/poetry_environments/dbt-testing-metrics/.venv/lib/python3.10/site-packages/dbt/context/providers.py", line 171, in dispatch
for package_name in search_packages:
TypeError: 'NoneType' object is not iterable
Environment
- OS: MacOS
- Python: 3.10.6
- dbt: 1.2.1
Which database adapter are you using with dbt?
postgres
Additional Context
Happy to submit the fix if the team thinks that the proposed fix above matches what the team thinks it is good code!
I think the fix is fine. Please submit a PR. Creating the test will probably be a lot more work than the fix :)
I prefer the fix like this (which I didn't actually try out 😅):
search_packages = None
if namespace is None:
search_packages = [None]
elif isinstance(namespace, str):
search_packages = self._adapter.config.get_macro_search_order(namespace)
if not search_packages and namespace in self._adapter.config.dependencies:
search_packages = [self.config.project_name, namespace]
else:
raise CompilationException(
f"In adapter.dispatch, got a {type(namespace)} macro_namespace argument "
f'("{macro_namespace}"), but macro_namespace should be None or a string.'
)
if search_packages is None:
search_packages = [None]
The first line ensures that there is an initial default value for search_packages, and the last two lines makes sure it has the shape expected by the logic that uses it downstream.
Even better:
- wrap all of this up in a function named
_get_search_packages(which would make thedispatchfunction shorter and easier to read and is probably more easy to unit test too):
def _get_search_packages(self, namespace: Optional[str] = None):
...
return search_packages
And then call it:
search_packages = self._get_search_packages(macro_namespace)
Then this extraneous line can probably be removed too:
namespace = macro_namespace
+1 got same error on 1.5.0. core
Thanks for sharing that @Gunnnn 👍
We've got a PR open to fix this which probably wouldn't be included any earlier than in v1.6.
Any workarounds if we can't yet upgrade to 1.6?
@pjatx the fix is really just providing a better error message to help find the source of the problem.
Either way, you'll still need to find the code that is misconfigured and fix it.
Are you getting this problem when installing a dbt project / package provided by someone else? Or are you getting it while doing some development on your own dbt project / package?
Here's an example of some simple misconfigured code that you can try out for yourself on any version of dbt:
macros/incorrect_dispatch.sql
{% macro cowsay() %}
{{ return(adapter.dispatch('cowsay', 'farm_utils')()) }}
{%- endmacro %}
{% macro default__cowsay() %}
'moo'
{% endmacro %}
👀 Note the difference between farm_utils above and test_utils below. You probably have something like this in your project somewhere. You can fix this by replacing 'farm_utils' above with 'test_utils' (matching whatever is in the name: within dbt_project.yml for the dbt project / package that contains the dispatched macro that is misconfigured.
dbt_project.yml
name: 'test_utils'
version: '1.0'
config-version: 2
profile: 'default'
macro-paths: ["macros"]
Once you've made the fix, then building a model like the following should work again without error:
models/my_model.sql
select {{ test_utils.cowsay() }} as cowsay
@dbeatty10 Thanks for the quick reply!
So in my case, it's coming from the following macro (if I remove it error goes away).
{% macro generate_sessionization_incremental_filter(merge_target, filter_tstamp, max_tstamp, operator) %}
{{ return(adapter.dispatch('generate_sessionization_incremental_filter', 'segment') (merge_target, filter_tstamp, max_tstamp, operator)) }}
{% endmacro %}
{% macro default__generate_sessionization_incremental_filter(merge_target, filter_tstamp, max_tstamp, operator) %}
where {{ filter_tstamp }} {{ operator }} (
select
{{ dbt.dateadd(
'hour',
-var('segment_sessionization_trailing_window'),
'max(' ~ max_tstamp ~ ')'
) }}
from {{ merge_target }}
)
{%- endmacro -%}
{% macro bigquery__generate_sessionization_incremental_filter(merge_target, filter_tstamp, max_tstamp, operator) %}
where {{ filter_tstamp }} {{ operator }} (
select
timestamp_sub(
max({{ max_tstamp }}),
interval {{ var('segment_sessionization_trailing_window') }} hour
)
from {{ merge_target }}
)
{%- endmacro -%}
{% macro postgres__generate_sessionization_incremental_filter(merge_target, filter_tstamp, max_tstamp, operator) %}
where cast({{ filter_tstamp }} as timestamp) {{ operator }} (
select
{{ dbt.dateadd(
'hour',
-var('segment_sessionization_trailing_window'),
'max(' ~ max_tstamp ~ ')'
) }}
from {{ merge_target }}
)
{%- endmacro -%}
[0m16:06:31.184333 [debug] [MainThread]: Command `dbt parse` failed at 16:06:31.184280 after 0.34 seconds
[0m16:06:31.184508 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10604fed0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11feb5510>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x100d72e10>]}
[0m16:06:31.184684 [debug] [MainThread]: Flushing usage events
Looks like it's related to the snowplow adapter, but that actually even being used in the project. Any insights appreciated!