dbt-redshift icon indicating copy to clipboard operation
dbt-redshift copied to clipboard

Not able to get columns for source, when defining unit test on a model that depends on source

Open graciegoheen opened this issue 1 year ago • 18 comments

          @graciegoheen sure

I have built a dbt model (say model_a) which targets a database _analysis and depends on source table source_b in the database _raw. The model builds fine so I know that the model's not buggy but the unit test fails with the following error:

Compilation Error in model source_b (models/source.yml)
Not able to get columns for unit test 'source_b' from relation "_raw"."public"."source_b"

> in macro get_fixture_sql (macros/unit_test_sql/get_fixture_sql.sql)
> called by model source_b (models/source.yml)
- name: test_model_a
    description: "Test model_a model"
    model: model_a
    given: 
      - input: source('_raw', 'source_b')
        rows:
          - ...
      - input: ref('model_c')
        rows:
          - ...
    expect:
      rows:
        - ...

Additional Information:

  • I don't have any unit tests defined for source_b
  • "_raw"."public"."source_b" does exist on the database and is dbt profile is able to read from it (because the model get's built)
  • the unit test works if I swap out source_b with a table materialized model of source_b in the unit test and the model for model_a.

I'm happy to share more and/or contribute to resolving this ticket.

Originally posted by @kd-nishanth in https://github.com/dbt-labs/dbt-core/issues/9686#issuecomment-2032609716

graciegoheen avatar Apr 03 '24 17:04 graciegoheen

Hi @kd-nishanth - moved this conversation over to a new issue as I don't think this is related to the ephemeral model problem. The direct parents of the model that you’re unit testing (in your example, anything model_a depends on - source_b and model_c) need to exist in the warehouse before you can execute the unit test, because we need to know what field names and data types to use when constructing the unit test sql.

Given that:

  • your source_b does exist on the database and is dbt profile is able to read from it
  • the unit test works if you swap out source_b with a table materialized model of source_b in the unit test and the model for model_a

this is quite perplexing!

Could you do two things:

  • check if you're able to run dbt docs generate for source_b
  • send an example of what your source yml definition looks like for source_b

graciegoheen avatar Apr 03 '24 17:04 graciegoheen

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Jul 03 '24 01:07 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions[bot] avatar Jul 10 '24 01:07 github-actions[bot]

Hi @graciegoheen ,

I'm running into this same problem (actually, with the same codebase @kd-nishanth was running into the problem with. So I'm hoping there is something wrong with our source definition yml.

Here is what we have (anonymized):

  - name: _raw
    database: _raw
    schema: public
    description: raw data
    tables:
      - name: source_b

Also, I tried running docs for the source, and I keep getting this error (I'm not sure if it's related to our test issue):

    Warning: The database user "r****@***.com" has insufficient permissions to
    query the "svv_table_info" table. Please grant SELECT permissions on this table
    to the "r****@***.com" user to fetch extended table details from Redshift.

Thanks in advance! It would be fantastic to unlock unit testing with source inputs.

ronco avatar Feb 04 '25 22:02 ronco

Of note, if I run the SQL it's using to gather the table info myself directly in the Redshift console it works. (I found the SQL by running the test with the debug flag on).

ronco avatar Feb 04 '25 22:02 ronco

@graciegoheen any traction on this? It would be great to be able to unit test model that depend on a source.

dkreth avatar Mar 04 '25 21:03 dkreth

Also, I tried running docs for the source, and I keep getting this error (I'm not sure if it's related to our test issue):

The warning message indicates that you don't have appropriate permissions to query this table, I am wondering if that's why you're not able to get the columns for the source.

Of note, if I run the SQL it's using to gather the table info myself directly in the Redshift console it works. (I found the SQL by running the test with the debug flag on).

@ronco are you using the same user in the Redshift console? with the same permissions?

graciegoheen avatar Mar 20 '25 17:03 graciegoheen

Hey @graciegoheen ,

I think I know what's going on. When I run the get_fixture_sql SQL against the _raw database, data is returned. However, when I run it against the _analysis db there are no results. So I'm guessing that macro is using the target connection rather than the source db connection to fetch metadata about the source table. Does that seem possible?

ronco avatar Mar 21 '25 17:03 ronco

So I'm guessing that macro is using the target connection rather than the source db connection to fetch metadata about the source table. Does that seem possible?

You might be onto something @ronco!

Does this look like the query you saw in the debug logs? If not, could you share (an appropriately redacted) version of the query you saw in there so we can try to figure out where in this codebase the error is coming from?

dbeatty10 avatar Mar 28 '25 02:03 dbeatty10

Hey @dbeatty10 ! Thanks for following up.

Actually, the query immediately prior to the failure is this:

      with bound_views as (
        select
          ordinal_position,
          table_schema,
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

        from information_schema."columns"
        where table_name = '<table_name>'
    ),

    unbound_views as (
      select
        ordinal_position,
        view_schema,
        col_name,
        case
          when col_type ilike 'character varying%' then
            'character varying'
          when col_type ilike 'numeric%' then 'numeric'
          else col_type
        end as col_type,
        case
          when col_type like 'character%'
          then nullif(REGEXP_SUBSTR(col_type, '[0-9]+'), '')::int
          else null
        end as character_maximum_length,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when col_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(col_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale

      from pg_get_late_binding_view_cols()
      cols(view_schema name, view_name name, col_name name,
           col_type varchar, ordinal_position int)
      where view_name = '<table_name>'
    ),

    external_views as (
      select
        columnnum,
        schemaname,
        columnname,
        case
          when external_type ilike 'character varying%' or external_type ilike 'varchar%'
          then 'character varying'
          when external_type ilike 'numeric%' then 'numeric'
          else external_type
        end as external_type,
        case
          when external_type like 'character%' or external_type like 'varchar%'
          then nullif(
            REGEXP_SUBSTR(external_type, '[0-9]+'),
            '')::int
          else null
        end as character_maximum_length,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 1),
            '')::int
          else null
        end as numeric_precision,
        case
          when external_type like 'numeric%'
          then nullif(
            SPLIT_PART(REGEXP_SUBSTR(external_type, '[0-9,]+'), ',', 2),
            '')::int
          else null
        end as numeric_scale
      from
        pg_catalog.svv_external_columns
      where
        schemaname = '<table_schema>'
        and tablename = '<table_name>'

    ),

    unioned as (
      select * from bound_views
      union all
      select * from unbound_views
      union all
      select * from external_views
    )

    select
      column_name,
      data_type,
      character_maximum_length,
      numeric_precision,
      numeric_scale

    from unioned
    
    where table_schema = '<table_schema>'
    
    order by ordinal_position

Hope that helps.

ronco avatar Apr 01 '25 16:04 ronco

I'm guessing that macro is using the target connection rather than the source db connection to fetch metadata about the source table. Does that seem possible?

The query you mentioned above looks like it's coming from redshift__get_columns_in_relation.

I would assume that this query would run using your target connection -- I'm not aware of anything in dbt that switches the database used in the connection based on the database defined within a dbt source.

So that could explain if if this query can't run successfully when using your target connection.

dbeatty10 avatar Apr 01 '25 17:04 dbeatty10

Definitely makes sense. Is there any way to override, or update this code to use the source connection rather than target connection when gathering metadata about a source?

ronco avatar Apr 01 '25 17:04 ronco

You can override this macro within your project by creating a file like the following:

{% macro redshift__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}
    -- The bulk of your SQL query goes here

    select
      column_name,
      data_type,
      character_maximum_length,
      numeric_precision,
      numeric_scale
    from ...
    order by ordinal_position
  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

But I don't know what query you'd put in there!

I believe the key problem would be to figure out if Redshift has a way to do cross-database queries where you can get the column names + data types, etc.

dbeatty10 avatar Apr 01 '25 18:04 dbeatty10

That makes sense. Though I can't seem to come up with the sql to query a separate database for information schemas. Is there any way to make that macro execute against a separate connection?

ronco avatar Apr 01 '25 20:04 ronco

Is there any way to make that macro execute against a separate connection?

To the best of my knowledge, there is no way to use a separate connection at the macro level. Rather, you can only switch connection details at the level of dbt commands via the --profile or --target variables.

Like:

dbt build --profile dev_profile

Or:

dbt build --target dev_target

dbeatty10 avatar Apr 01 '25 21:04 dbeatty10

Transferred from https://github.com/dbt-labs/dbt-core/issues/9851

dbeatty10 avatar Apr 01 '25 21:04 dbeatty10

@ronco I'm handing this off to our team that maintains our dbt-redshift adapter. They have more expertise on all the nitty-gritty details of what is/isn't possible.

dbeatty10 avatar Apr 01 '25 22:04 dbeatty10

Thanks @dbeatty10 , I appreciate the help.

ronco avatar Apr 01 '25 23:04 ronco