dbt-redshift
dbt-redshift copied to clipboard
[Bug] `adapter.get_columns_in_relation` does not work cross database
Is this a new bug in dbt-redshift?
- [X] I believe this is a new bug in dbt-redshift
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
Not too sure if this calls for it's own issue but pretty much an extension of #179 - adapter.get_columns_in_relation does not support introspecting a table that is in a different database.
Expected Behavior
Just like #179 - we need to make adapter.get_columns_in_relation work across databases.
Steps To Reproduce
- Using an RA3 redshift cluster with 2 database (
devandsources_only) add 2 tables.
-- using a connection to database sources_only
create table sources_only.public.source_foo as select 1 id;
-- using a connection to database dev
create table dev.public.source_bar as select 1 id;
- Connect your dbt project to database
dev:
# ~/.dbt/profiles.yml
redshift:
target: default
outputs:
default:
type: redshift
host: ...
port: 5439
database: dev
user: root
password: ...
schema: public
ra3_node: true
- Add 2 sources:
# models/sources.yml
version: 2
sources:
- name: from_default_db
schema: public
tables:
- name: source_bar
- name: from_the_other_db
database: sources_only
schema: public
tables:
- name: source_foo
- Add a model to test:
-- models/checker.sql
select * from {{ source('from_the_other_db', 'source_foo') }}
union all
select * from {{ source('from_default_db', 'source_bar') }}
$ dbt compile -s checker
02:42:54 Running with dbt=1.7.9
02:42:55 Registered adapter: redshift=1.7.4
02:42:57 Found 1 model, 2 sources, 0 exposures, 0 metrics, 606 macros, 0 groups, 0 semantic models
02:42:57
02:43:03 Concurrency: 1 threads (target='rs')
02:43:03
02:43:03 Compiled node 'checker' is:
select * from "sources_only"."public"."source_foo"
union all
select * from "dev"."public"."source_bar"
$ dbt show -s checker
02:44:36 Running with dbt=1.7.9
02:44:38 Registered adapter: redshift=1.7.4
02:44:40 Found 1 model, 2 sources, 0 exposures, 0 metrics, 606 macros, 0 groups, 0 semantic models
02:44:40
02:44:46 Concurrency: 1 threads (target='rs')
02:44:46
02:44:47 Previewing node 'checker':
| id |
| -- |
| 1 |
| 1 |
- Test out
get_columns_in_relation:
-- models/checker.sql
{% set c1 = adapter.get_columns_in_relation(source('from_the_other_db', 'source_foo')) %}
{% set c2 = adapter.get_columns_in_relation(source('from_default_db', 'source_bar')) %}
---------
{{ c1 }}
----------
{{ c2 }}
$ dbt compile -s checker
02:48:36 Running with dbt=1.7.9
02:48:37 Registered adapter: redshift=1.7.4
02:48:39 Found 1 model, 2 sources, 0 exposures, 0 metrics, 606 macros, 0 groups, 0 semantic models
02:48:39
02:48:44 Concurrency: 1 threads (target='rs')
02:48:44
02:48:48 Compiled node 'checker' is:
---------
[]
----------
[<Column id (integer)>]
^ We didn't manage to retrieve column id for the source that is in the other database (sources_only).
The reason for that is straightforward - if we look at the get_columns_in_relation implementation and try and run that query straight up in Redshift:
Relevant log output
No response
Environment
- OS: macOS
- Python: 3.11
- dbt-core: 1.7.9
- dbt-redshift: 1.7.4
Additional Context
This results in https://github.com/dbt-labs/dbt-codegen/issues/167 - which further materializes itself in dbt Cloud IDE as users try and use the "generate model" function and find that it doesn't work as expected:
Looks like svv_redshift_columns (https://docs.aws.amazon.com/redshift/latest/dg/cross-database-overview.html) might be what we want to query here
We are running into the same issue. We are trying to create the dbt output in an other database than the source data is located.
adapter.get_columns_in_relation is not returning any columns for tables located in a different database.
I tracked it down to this line where information_schema."columns" is used. This table only returns information about the current database and therefore returns an empty list of columns for tables located in another database.
Maybe something like this could replace information_schema."coloumns" :
SELECT ordinal_position,
table_name,
column_name,
data_type,
character_maximum_length,
numeric_precision,
numeric_scale
FROM SVV_ALL_COLUMNS WHERE database_name = '<database>' AND schema_name = '<schema>' AND table_name = '<table>'
We are running into the same issue
@VolkerSchiewe are you also using RA3 nodes?
We are running into the same issue
@VolkerSchiewe are you also using RA3 nodes?
yes
@VolkerSchiewe @jeremyyeo I just opened #738, do you want to it's version of redshift__get_columns_in_relation()? I'm also going to ask the Redshift team for advice
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
@dataders any update on that topic? Have you talked to the redshift team?
hey @VolkerSchiewe yeah we've been in discussion with them. Have you looked over #742? The consensus between their team and ours right now is that rather than solve one-off issues like this one, we instead holistically address cross-database functionality.
Are you on a team currently that would benefit from this feature? If so I can log this internally so we can track interest over time.
Hey @dataders , my team and I are also facing this same exact issue. We were planning to data shares to get our production tables into a lower environment, but it comes in as a separate database.
Whenever I look at dbt logs, I see that it queries information_schema."columns" which doesn't return those data-shared (separate database) tables.
However, when I query svv_redshift_columns, we see those production tables (datashared) in this table.
This would help us tremendously and is a huge blocker for us to get our lower environment setup.
@bteh the solution for Redshift that I came up with in the meantime is to create a macro like this:
{% macro get_redshift_columns(database, schema, table) %}
{% set columns = [] %}
{% if execute %}
{% set columns = run_query("SHOW COLUMNS FROM TABLE "~database~"."~schema~"."~table) %}
{% endif %}
{% do return(columns) %}
{% endmacro %}
and then call it in your model.
It seems that this issue also affects codegen's generate_source command (https://github.com/dbt-labs/dbt-codegen/blob/0.12.1/macros/generate_source.sql)