Not able to get columns for source, when defining unit test on a model that depends on source
@graciegoheen sure
I have built a dbt model (say model_a) which targets a database _analysis and depends on source table source_b in the database _raw. The model builds fine so I know that the model's not buggy but the unit test fails with the following error:
Compilation Error in model source_b (models/source.yml)
Not able to get columns for unit test 'source_b' from relation "_raw"."public"."source_b"
> in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
> called by model source_b (models/source.yml)
- name: test_model_a
description: "Test model_a model"
model: model_a
given:
- input: source('_raw', 'source_b')
rows:
- ...
- input: ref('model_c')
rows:
- ...
expect:
rows:
- ...
Additional Information:
- I don't have any unit tests defined for
source_b "_raw"."public"."source_b"does exist on the database and is dbt profile is able to read from it (because the model get's built)- the unit test works if I swap out
source_bwith atablematerialized model ofsource_bin the unit test and the model formodel_a.
I'm happy to share more and/or contribute to resolving this ticket.
Originally posted by @kd-nishanth in https://github.com/dbt-labs/dbt-core/issues/9686#issuecomment-2032609716
Hi @kd-nishanth - moved this conversation over to a new issue as I don't think this is related to the ephemeral model problem. The direct parents of the model that you’re unit testing (in your example, anything model_a depends on - source_b and model_c) need to exist in the warehouse before you can execute the unit test, because we need to know what field names and data types to use when constructing the unit test sql.
Given that:
- your
source_bdoes exist on the database and is dbt profile is able to read from it - the unit test works if you swap out
source_bwith a table materialized model ofsource_bin the unit test and the model formodel_a
this is quite perplexing!
Could you do two things:
- check if you're able to run
dbt docs generateforsource_b - send an example of what your source yml definition looks like for
source_b
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Hi @graciegoheen ,
I'm running into this same problem (actually, with the same codebase @kd-nishanth was running into the problem with. So I'm hoping there is something wrong with our source definition yml.
Here is what we have (anonymized):
- name: _raw
database: _raw
schema: public
description: raw data
tables:
- name: source_b
Also, I tried running docs for the source, and I keep getting this error (I'm not sure if it's related to our test issue):
Warning: The database user "r****@***.com" has insufficient permissions to
query the "svv_table_info" table. Please grant SELECT permissions on this table
to the "r****@***.com" user to fetch extended table details from Redshift.
Thanks in advance! It would be fantastic to unlock unit testing with source inputs.
Of note, if I run the SQL it's using to gather the table info myself directly in the Redshift console it works. (I found the SQL by running the test with the debug flag on).
@graciegoheen any traction on this? It would be great to be able to unit test model that depend on a source.
Also, I tried running docs for the source, and I keep getting this error (I'm not sure if it's related to our test issue):
The warning message indicates that you don't have appropriate permissions to query this table, I am wondering if that's why you're not able to get the columns for the source.
Of note, if I run the SQL it's using to gather the table info myself directly in the Redshift console it works. (I found the SQL by running the test with the debug flag on).
@ronco are you using the same user in the Redshift console? with the same permissions?
Hey @graciegoheen ,
I think I know what's going on. When I run the get_fixture_sql SQL against the _raw database, data is returned. However, when I run it against the _analysis db there are no results. So I'm guessing that macro is using the target connection rather than the source db connection to fetch metadata about the source table. Does that seem possible?
So I'm guessing that macro is using the target connection rather than the source db connection to fetch metadata about the source table. Does that seem possible?
You might be onto something @ronco!
Does this look like the query you saw in the debug logs? If not, could you share (an appropriately redacted) version of the query you saw in there so we can try to figure out where in this codebase the error is coming from?
Hey @dbeatty10 ! Thanks for following up.
Actually, the query immediately prior to the failure is this:
with bound_views as (
select
ordinal_position,
table_schema,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from information_schema."columns"
where table_name = '<table_name>'
),
unbound_views as (
select
ordinal_position,
view_schema,
col_name,
case
when col_type ilike 'character varying%' then
'character varying'
when col_type ilike 'numeric%' then 'numeric'
else col_type
end as col_type,
case
when col_type like 'character%'
then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
else null
end as character_maximum_length,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when col_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from pg_get_late_binding_view_cols()
cols(view_schema name, view_name name, col_name name,
col_type varchar, ordinal_position int)
where view_name = '<table_name>'
),
external_views as (
select
columnnum,
schemaname,
columnname,
case
when external_type ilike 'character varying%' or external_type ilike 'varchar%'
then 'character varying'
when external_type ilike 'numeric%' then 'numeric'
else external_type
end as external_type,
case
when external_type like 'character%' or external_type like 'varchar%'
then nullif(
REGEXP_SUBSTR(external_type, '[0-9]+'),
'')::int
else null
end as character_maximum_length,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
'')::int
else null
end as numeric_precision,
case
when external_type like 'numeric%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from
pg_catalog.svv_external_columns
where
schemaname = '<table_schema>'
and tablename = '<table_name>'
),
unioned as (
select * from bound_views
union all
select * from unbound_views
union all
select * from external_views
)
select
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from unioned
where table_schema = '<table_schema>'
order by ordinal_position
Hope that helps.
I'm guessing that macro is using the target connection rather than the source db connection to fetch metadata about the source table. Does that seem possible?
The query you mentioned above looks like it's coming from redshift__get_columns_in_relation.
I would assume that this query would run using your target connection -- I'm not aware of anything in dbt that switches the database used in the connection based on the database defined within a dbt source.
So that could explain if if this query can't run successfully when using your target connection.
Definitely makes sense. Is there any way to override, or update this code to use the source connection rather than target connection when gathering metadata about a source?
You can override this macro within your project by creating a file like the following:
{% macro redshift__get_columns_in_relation(relation) -%}
{% call statement('get_columns_in_relation', fetch_result=True) %}
-- The bulk of your SQL query goes here
select
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
from ...
order by ordinal_position
{% endcall %}
{% set table = load_result('get_columns_in_relation').table %}
{{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}
But I don't know what query you'd put in there!
I believe the key problem would be to figure out if Redshift has a way to do cross-database queries where you can get the column names + data types, etc.
That makes sense. Though I can't seem to come up with the sql to query a separate database for information schemas. Is there any way to make that macro execute against a separate connection?
Is there any way to make that macro execute against a separate connection?
To the best of my knowledge, there is no way to use a separate connection at the macro level. Rather, you can only switch connection details at the level of dbt commands via the --profile or --target variables.
Like:
dbt build --profile dev_profile
Or:
dbt build --target dev_target
Transferred from https://github.com/dbt-labs/dbt-core/issues/9851
@ronco I'm handing this off to our team that maintains our dbt-redshift adapter. They have more expertise on all the nitty-gritty details of what is/isn't possible.
Thanks @dbeatty10 , I appreciate the help.