dbt-bigquery
dbt-bigquery copied to clipboard
[CT-1631] [Bug] dbt snapshot fails on schema change in nested fields
Is this a new bug in dbt-bigquery?
- [X] I believe this is a new bug in dbt-bigquery
- [X] I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
I get datatype error when trying to run snapshot on a table, where there was a schema change in a nested field(column added/removed) since the last snapshot run.
Value has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> which cannot be inserted into column created_by, which has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> at [16:33]
Attached on the left is the updated schema of the tmp table to be merged, on the right is the original table
Expected Behavior
Snapshot table is updated successfully and includes new changes in schema of the nested fields
Steps To Reproduce
- Create initial snapshot of a table that contains nested column
- Add/remove one more column to/from the nested column of the table
- Run dbt snapshot
Relevant log output
`Value has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> which cannot be inserted into column created_by, which has type STRUCT<module STRING, reference_entity_class STRING, reference_table STRING, ...> at [16:33]`
Environment
- OS:macOS-12.6-arm64-arm-64bit
- Python:3.9.12
- dbt-core:1.3.0
- dbt-bigquery:1.3.0
Additional Context
No response
I can reproduce, but I don't think that's a bug.
BigQuery doesn't support schema mutations on record fields.
From a user standpoint it looks like the field is of type STRUCT
, but from the database standpoint the type is actually STRUCT<module STRING, reference_entity_class STRING, reference_table STRING>
(let's focus on your first 3 properties). If you want to insert a new value of type STRUCT<module STRING, reference_entity_class STRING>
(missing that last field), for BigQuery it's as invalid as trying to insert a string into an integer column.
I know records come with the expectation of being schema-on-read, but that's not the case for BigQuery.
Since that's the exact role of snapshots in dbt, dealing with mutations, I'm not against adding that to the feature set.
I'll re-classify as enhancement
. I don't think we'll be able to prioritize in the short term, so I'll flag it as help_wanted
.
Thanks a lot for this issue @dashnak90
Has there been an update in dbt snapshots for solving this issue? @Fleid
@HansalShah007 the latest is my comment just above :)
The core issue is that BigQuery doesn't accept changes in nested fields. We won't be able to add a workaround in dbt in the short term, but definitely accepting contributions on the topic!