terraform-provider-google
terraform-provider-google copied to clipboard
BigQuery table schema update fails/tries to replace the table if a column gets removed
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
- Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
- If you are interested in working on this issue or have submitted a pull request, please leave a comment.
- If an issue is assigned to the
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.
Terraform Version
$ terraform -v
Terraform v1.2.8
on windows_amd64
+ provider registry.terraform.io/hashicorp/google v4.34.0
Affected Resource(s)
- google_bigquery_table
Terraform Configuration Files
original config
resource "google_bigquery_table" "test_table" {
dataset_id = "dataset_id"
table_id = "table_id"
schema = <<-EOF
[
{
"name": "field1",
"type": "STRING"
},
{
"name": "field2",
"type": "STRING"
},
{
"name": "field3",
"type": "STRING"
}
]
EOF
}
modified config
resource "google_bigquery_table" "test_table" {
dataset_id = "dataset_id"
table_id = "table_id"
schema = <<-EOF
[
{
"name": "field1",
"type": "STRING"
},
{
"name": "field3",
"type": "STRING"
}
]
EOF
}
Debug Output
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# *.google_bigquery_table.* must be replaced
-/+ resource "google_bigquery_table" "*" {
~ creation_time = * -> (known after apply)
~ etag = "*" -> (known after apply)
~ expiration_time = 0 -> (known after apply)
~ id = "*" -> (known after apply)
- labels = {} -> null
~ last_modified_time = * -> (known after apply)
~ location = "EU" -> (known after apply)
~ num_bytes = 0 -> (known after apply)
~ num_long_term_bytes = 0 -> (known after apply)
~ num_rows = 0 -> (known after apply)
~ project = "*" -> (known after apply)
~ schema = jsonencode(
~ [ # forces replacement
{
name = "field1"
type = "STRING"
},
- {
- name = "field2"
- type = "STRING"
},
{
name = "field3"
type = "STRING"
},
]
)
~ self_link = "*" -> (known after apply)
~ type = "TABLE" -> (known after apply)
# (3 unchanged attributes hidden)
}
Plan: 1 to add, 0 to change, 1 to destroy.
Error: googleapi: Error 409: Already Exists: Table *:*.*, duplicate
with module.*.google_bigquery_table.*,
on ../../modules/*/bigquery.tf line 157, in resource "google_bigquery_table" "*":
157: resource "google_bigquery_table" "*" {
Panic Output
Expected Behavior
There should be a config ( for example ignore_unknown_columns ) to influence what to do with columns that has been removed from the TF config.
- keep it untouched and ignore them from now on OR
- current behaviour
At the 8th step in "Steps to Reproduce" terraform should either fail/try to replace it (if we have no new config, just like now), or it should successfully apply the changes, and still keep the columns in BQ that aren't present anymore in the TF schema (if the new config gets enabled).
Actual Behavior
At the 8th step in "Steps to Reproduce" terraform fails. See "Debug Output".
Steps to Reproduce
- have an auto-generated bq schema (for example from proto, mysql cdc, whatever)
- create a bigquery table config in terraform with that schema
- tf apply (table gets created)
- use that table and fill all the columns with data
- get a warning from developers/etc that scripts require all those fields, so never remove them
- the source of the auto-generated schema removes that field
- auto-generated schema gets updated with one less column
- tf apply
IMO this is a quite common pattern, that people mirror/ingest/capture something from another system into BQ, and just because the source of the data doesn't have/fill a certain column anymore, the users / scripts / etc might still need that field even if it contains values only for older entries.
Important Factoids
I wanted to emphasize here, that I was torn between this being a bug or an enhancement. I chose bug, because it prevent such a common use-case, and tries to solve it in a catastrophically wrong way (table replacement).
References
@nbali did you review the limitation on the schema below?
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/bigquery_table#schema
@edwardmedia I did, and I get why is it like that, BUT I don't see how is that relevant on the long term though. IMO it's not a limitation that can't be worked around. It's just a limitation that hasn't been handled in a more thorough manner. YET. Both values are JSONs. They can be diffed, merged, etc. This config would require the current and the expected schema to be merged together (if a column is missing from the expected schema, but present in the current, add it to the expected), and applied on table. It will still fail if the update is not doable (like changing type from record to integer), but it wouldn't fail due to the "deprecated" columns.
@nbali are you able to create a config with resources (not module) that can be reproed the issue?
@edwardmedia I have updated the issue with original + modified config, and the received output.
@nbali somehow I can't repro the issue. Are you able to run from the original
config to the modified
config directly without hitting below error?
After adding deletion_protection=false
to your code, it seems to be fine.
Error: cannot destroy instance without setting deletion_protection=false and running `terraform apply`
Did I miss anything here in repro?
Well I think you missed probably the most important business aspect of this issue. The table should never be dropped, and that deletion protection should never be turned off. I mean doesn't it just drops the table and creates a new one in that case? That is an even worse scenario than to unable to update a schema.
@nbali I do see tables.patch is available. But by the time this resource was implemented, only tables.update
was available. That is why you see # forces replacement
in the plan, and this is by design. To request this change to use tables.patch
, you may file an enhancement. Does that make sense?
The table should never be dropped, and that deletion protection should never be turned off.
@edwardmedia Using tables.patch instead of table.update would change the "replace" behaviour? I don't see how it's coming from the API at all, so why would that change it?
"this is by design" What part of your statement are you referring to here? That it recreates the table? I highly doubt anyone would assume the table will be recreated with a simple schema modification. So what is the design aspect behind that?
@nbali I am quoting the content from the link I provided here
tables.patch:
Updates information in an existing table. The update method replaces the entire table resource, whereas the patch method only replaces fields that are provided in the submitted table resource.
Again tables.patch wasn't available and recreation was by design. Still have questions?
To request the change to use tables.patch
, please file an enhancement. This is not a bug and close this issue accordingly.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.