dbt-utils icon indicating copy to clipboard operation
dbt-utils copied to clipboard

union should explicitly specify table name in casts

Open wearpants opened this issue 5 years ago • 7 comments

Bigquery gets confused by tables that have a column with the same name in CASTs, and will try to cast the entire table instead of the column.

-- causes error, as foo is a struct
select cast(`foo` as STRING) as `foo` 
from `data-playground-1234`.`myschema`.`foo`

-- works when table name is specified
select cast(`foo`.`foo` as STRING) as `foo` 
from `data-playground-1234`.`myschema`.`foo`

I think this is a simple as adding table.name here, like:

{%- set col_name = adapter.quote(table.name) + '.' + adapter.quote(col_name) if col_name in table_columns[table] else 'null' %}

wearpants avatar Nov 06 '19 00:11 wearpants

I would think that's a general problem with BigQuery and would have to be addressed everywhere you cast a column, not just in this macro? The other obvious "fix" is to not have columns with the same name as a table?

clausherther avatar Nov 06 '19 15:11 clausherther

In my real world, it's a pre-existing table with a single column (plus id) so changing it will be difficult

I'm not sure I understand why needing to change it elsewhere is an argument against changing it here?

On Wed, Nov 6, 2019, 10:39 AM Claus Herther [email protected] wrote:

I would think that's a general problem with BigQuery and would have to be addressed everywhere you cast a column, not just in this macro? The other obvious "fix" is to not have columns with the same name as a table?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fishtown-analytics/dbt-utils/issues/173?email_source=notifications&email_token=AAD3MAFLZTXQFVZY7GWEDGLQSLQMPA5CNFSM4JJNCLA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDG64GI#issuecomment-550366745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD3MAANSQZSMQQWOXXUNQLQSLQMPANCNFSM4JJNCLAQ .

wearpants avatar Nov 06 '19 18:11 wearpants

I'll try to send a PR in the next day or two if you'd be willing to accept it

On Wed, Nov 6, 2019, 1:40 PM Pete Fein [email protected] wrote:

In my real world, it's a pre-existing table with a single column (plus id) so changing it will be difficult

I'm not sure I understand why needing to change it elsewhere is an argument against changing it here?

On Wed, Nov 6, 2019, 10:39 AM Claus Herther [email protected] wrote:

I would think that's a general problem with BigQuery and would have to be addressed everywhere you cast a column, not just in this macro? The other obvious "fix" is to not have columns with the same name as a table?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fishtown-analytics/dbt-utils/issues/173?email_source=notifications&email_token=AAD3MAFLZTXQFVZY7GWEDGLQSLQMPA5CNFSM4JJNCLA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDG64GI#issuecomment-550366745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD3MAANSQZSMQQWOXXUNQLQSLQMPANCNFSM4JJNCLAQ .

wearpants avatar Nov 06 '19 18:11 wearpants

I said as much in the linked PR, but to follow up on @clausherther's comment above: a good approach to address this issue on BQ might be to alias the table name instead of changing the source table/column names. I do wish that BigQuery was a little bit smarter here.... alas.

So, this works:

with data as (
  select 1 as data
)

select cast(data.data as string) from data;

and this also works:

with data as (
  select 1 as data
)

select cast(data as string) from data as data______dbt_generated______;

but this does not:

with data as (
  select 1 as data
)

select cast(data as string) from data;
--> Invalid cast from STRUCT<data INT64> to STRING at [7:13]

All things considered, I think i like the table alias more that the column qualification. It can be really challenging to get quoting/capitalization exactly right on different databases in different contexts. I think it's straightforward here (nice work on that fix @wearpants) but it other contexts, it may be preferable to alias tables instead.

drewbanin avatar Nov 17 '19 17:11 drewbanin

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Aug 27 '23 01:08 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions[bot] avatar Sep 03 '23 01:09 github-actions[bot]

Hi @drewbanin, I also face this issue in my development with BigQuery, please help me to review the changes if it's make sense to you. Thanks!

adrian3ka avatar Jan 23 '24 02:01 adrian3ka