terraform-provider-google icon indicating copy to clipboard operation
terraform-provider-google copied to clipboard

google_notebooks_instance is being recreated when disk_encryption is not explicitly defined in resource

Open daltonmatos opened this issue 1 year ago • 7 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to a user, that user is claiming responsibility for the issue.
  • Customers working with a Google Technical Account Manager or Customer Engineer can ask them to reach out internally to expedite investigation and resolution of this issue.

Terraform Version

t version
Terraform v1.7.3
on linux_amd64
+ provider registry.terraform.io/hashicorp/google v5.16.0
+ provider registry.terraform.io/hashicorp/google-beta v5.16.0

Affected Resource(s)

google_notebooks_instance

Terraform Configuration

locals {
  google_project_id     = "<project-id-here>"
  google_default_region = "us-east1"
  vpc_id                = "<vpc-id-here>"
  subnet_id             = "<subnet-id-here>"
}
terraform {
  backend "local" {
    path = "default.tfstate"
  }
}

data "google_compute_network" "network" {
  name = local.vpc_id
}

data "google_compute_subnetwork" "subnet" {
  name = local.subnet_id
}

provider "google" {
  project = local.google_project_id
  region  = local.google_default_region
}

provider "google-beta" {
  project = local.google_project_id
  region  = local.google_default_region
}

# Retrieve an access token as the Terraform runner
data "google_client_config" "provider" {}

terraform {
  required_version = "~> 1.7.0"

  required_providers {
    google      = "~> 5.0"
    google-beta = "~> 5.0"
  }
}
resource "google_notebooks_instance" "temporary_workbenches" {
  for_each = {
    "test-always-recreate-1" = {}
  }

  name     = each.key
  location = "us-east1-b"

  machine_type      = try(each.value.machine_type, "e2-highmem-8")
  boot_disk_type    = "PD_SSD"
  boot_disk_size_gb = try(each.value.disk_size_gb, 50)
  no_public_ip      = true
  no_proxy_access   = false

  network = data.google_compute_network.network.id
  subnet  = data.google_compute_subnetwork.subnet.id

  service_account_scopes = try(each.value.service_account_scopes, [
    "https://www.googleapis.com/auth/cloud-platform",
    "https://www.googleapis.com/auth/devstorage.read_write",
    "https://www.googleapis.com/auth/bigquery",
    "https://www.googleapis.com/auth/userinfo.email",
  ])

  metadata = {
    proxy-mode = "service_account"
    terraform  = "true"
  }

  vm_image {
    project      = "deeplearning-platform-release"
    image_family = "tf-latest-cpu"
  }

  lifecycle {
    ignore_changes = [
      service_account_scopes,
    ]
  }

}

Debug Output

https://gist.github.com/daltonmatos/7ef74c9e6de52dfa7dbde1003137de8c

Expected Behavior

The expected behavior would be an up to date result.

Actual Behavior

The resource si being recreated

Steps to reproduce

Just run terraform apply twice.

  1. terraform apply
  2. terraform apply

Important Factoids

The field that is causing the recreation of the resource is disk_encryption. According to the docs this field is optional and, as such, is not being declared in the configuration.

Terraform seems to be concludinbg that this field uis being removed:

- disk_encryption        = "GMEK" -> null # forces replacement

but as "GMEK" is the default value, the plan shoud return up to date.

This same behavior happens with terraform 1.4.7 and google provider 4.84 (used in out production IaC). We saw this recreate problem about two days ago, we had 20+ notebooks running for months with no issue with recurring plans before this happened.

References

https://registry.terraform.io/providers/hashicorp/google/5.16.0/docs/resources/notebooks_instance#disk_encryption

daltonmatos avatar Feb 16 '24 17:02 daltonmatos

google_notebooks_instance is an immutable resource and it gets re-created if a change is detected.

In the logs, it looks like disk_encryption field was previously set to GMEK. Could you either add this field in your TF config, or try to figure out why the value was set? I don't see the default value being GMEK if the field is not defined.

hao-nan-li avatar Feb 16 '24 18:02 hao-nan-li

Hello @hao-nan-li,

The doc says that the default value is "GMEK". So I think that as the field is not declared it received the default value. Captura de tela de 2024-02-16 15-24-30

If I add disk_encryution = "GMEK" the plan returns "Up to date", but the point here is: This same cofiguration created the notebook on the first apply and on the second apply (without any code modification) is trying to recreate the resource. That's the problem.

daltonmatos avatar Feb 16 '24 18:02 daltonmatos

Thanks for the clarification, looks like this field has a default_from_api set to true, but not having a default_value to GMEK property. https://github.com/GoogleCloudPlatform/magic-modules/blob/main/mmv1/products/notebooks/Instance.yaml#L346

hao-nan-li avatar Feb 16 '24 18:02 hao-nan-li

We had some notebooks running for months. We are using terraform 1.4.7 and google provider 4.84. We never had this issue before. For now, we added this field to our terraform configuration but I think that, being this field an optional field, terraform shouldn't try to recreate these resource. It should just return Up to date after the plan. This sudden re-creation took us on surprise.

What could explain the re-creation only now?

Does the GCP API returned null on this field before and started to return the default value? Or maybe this field was not part of the response payload from gcp api and now it does?

Could this explain this terraform behavior?

daltonmatos avatar Feb 16 '24 18:02 daltonmatos

Should this be considered a bug on the google provider?

daltonmatos avatar Feb 19 '24 13:02 daltonmatos

We had some notebooks running for months. We are using terraform 1.4.7 and google provider 4.84. We never had this issue before. For now, we added this field to our terraform configuration but I think that, being this field an optional field, terraform shouldn't try to recreate these resource. It should just return Up to date after the plan. This sudden re-creation took us on surprise.

What could explain the re-creation only now?

Does the GCP API returned null on this field before and started to return the default value? Or maybe this field was not part of the response payload from gcp api and now it does?

Could this explain this terraform behavior?

It's very possible that the field was not returned before, but now is returning the GMEK value by default. We have seen this happen in other APIs in the past causing this exact behavior.

slevenick avatar Feb 20 '24 16:02 slevenick

That's interesting.

Is there anything I can do to prevent this behavior in the future? Or being this an factor external do terraform, the best I can do is to always review carefully the output of the plan command? (Not that I don't look at my plan outputs, but until now I assumed that if a terraform code didn't change, the resources managed by this code wouldn't change too)

More important: Was there anything I could have done to prevent this sudden recreation of resources?

daltonmatos avatar Feb 20 '24 16:02 daltonmatos

The fix is in Terraform provider 5.17. After upgrading the provider to 5.17, terraform plan should be clean and the resource should not be recreated without any configuration change.

zli82016 avatar Feb 28 '24 21:02 zli82016

Was there anything I could have done to prevent this sudden recreation of resources?

Unfortunately the provider does not have control over what changes APIs might make. Reviewing your plans (especially if resources will be recreated) never hurts. You can also use deletion_protection fields (for resources that support them) or set lifecycle.prevent_destroy on critical infrastructure.

Closing this issue as resolved by https://github.com/GoogleCloudPlatform/magic-modules/pull/9915

melinath avatar Feb 28 '24 22:02 melinath

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Apr 04 '24 02:04 github-actions[bot]