terraform-provider-google icon indicating copy to clipboard operation
terraform-provider-google copied to clipboard

State is not properly tracked when using data_disk_type on google_notebooks_instance resource

Open juanoi opened this issue 4 years ago • 8 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

v0.14.4

Affected Resource(s)

  • google_notebooks_instance

Terraform Configuration Files

provider "google-beta" {
  version = "~> 3.54.0"
  project = "PROJECTNAME"
  region  = "europe-west6"
}

resource "google_notebooks_instance" "notebook" {
  provider = google-beta

  name         = "INSTANCENAME"
  post_startup_script = "gs://somebucket/myscript.sh"
  machine_type = "n1-standard-1"
  location = "europe-west6-a"

  vm_image {
    project = "deeplearning-platform-release"
    image_family = "common-cpu-notebooks"
  }

  boot_disk_type = "PD_STANDARD"
  boot_disk_size_gb = 100
  data_disk_type = "PD_STANDARD"
  data_disk_size_gb = 100

  service_account = "MY-SERVICE-ACCOUNT"

  metadata = {
    "proxy-mode" = "none",
    "terraform" = "true"
  }
  
  no_public_ip = true
  network = "MYNETWORK"
  subnet = "MYSUBNETK"
}

Expected Behavior

After running an apply successfully, any subsequent apply or plan should show no changes.

Actual Behavior

After running an apply succesfully, a subsequent apply or plan shows that the instance needs to be redeployed because data_disk_type changes from "" to "PD_STANDARD".

Steps to Reproduce

  1. terraform apply
  2. terraform apply

Important Factoids

  • Explicity checking the TF state file it can be seen that the field data_disk_type is not updated correctly
  • This happens both when authenticating as user and as service account.
  • The problem has been reproduced from different local hosts.

juanoi avatar Feb 17 '21 12:02 juanoi

@juanoi API is not returning those attributes in the response, hence no value set in the state.

Although the instance was created a day go, i see state is still provisioning. Are you seeing similar API response ?

{
    "name": "projects/test-project1-xxxx/locations/us-west1-a/instances/notebooks-instance",
    "serviceAccount": "[email protected]",
    "machineType": "https://www.googleapis.com/compute/v1/projects/test-project1-xxxx/zones/us-west1-a/machineTypes/e2-medium",
    "state": "PROVISIONING",
    "noPublicIp": true,
    "noProxyAccess": true,
    "network": "https://www.googleapis.com/compute/v1/projects/test-project1-xxxx/global/networks/default",
    "subnet": "https://www.googleapis.com/compute/v1/projects/test-project1-xxxx/regions/us-west1/subnetworks/default",
    "metadata": {
        "proxy-mode": "none",
        "terraform": "true",
        "shutdown-script": "/opt/deeplearning/bin/shutdown_script.sh",
        "notebooks-api": "PROD"
    },
    "createTime": "2021-02-18T00:06:42.241038824Z",
    "updateTime": "2021-02-18T00:07:13.016087205Z"
}

venkykuberan avatar Feb 18 '21 17:02 venkykuberan

@venkykuberan sorry for the delay replying.

Uhm not really. I did see it in the terraform state after the apply succeeded but this was updated to RUNNING after I ran terraform refresh, which makes sense.

I just created a new instance and 5-10 minutes after creating the instance gcloud shows me RUNNING as status.

I'm runnig Google Cloud SDK 312.0.0

juanoi avatar Mar 03 '21 13:03 juanoi

I'm actually having the same issue. I can reproduce it using the following steps:

  1. Create a new AI Platform notebook with the following Terraform config:
resource "google_notebooks_instance" "trainer-notebook-cu101-2" {
  name = "trainer-notebook-cu101-2"
  location = var.zone
  machine_type = "n1-standard-8"

  install_gpu_driver = true
  accelerator_config {
    type         = "NVIDIA_TESLA_P100"
    core_count   = 1
  }

  vm_image {
    project      = "deeplearning-platform-release"
    image_family = "common-cu101-notebooks-debian-9"
  }

  boot_disk_type = "PD_SSD"
  boot_disk_size_gb = 100

  data_disk_type = "PD_SSD"
  data_disk_size_gb = 100

  no_public_ip = true

  network = data.google_compute_network.network.id
  subnet = data.google_compute_subnetwork.subnetwork.id

  depends_on = [google_project_service.notebook]
}
  1. Run terraform apply --var-file environment/project.tfvars.
  2. Wait for the instance to be created.
  3. Once it is created, run terraform apply --var-file environment/project.tfvars again. You will now see the following output:
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # google_notebooks_instance.trainer-notebook-cu101-2 must be replaced
-/+ resource "google_notebooks_instance" "trainer-notebook-cu101-2" {
        boot_disk_size_gb      = 100
        boot_disk_type         = "PD_SSD"
      ~ create_time            = "2021-07-05T07:20:58.003814147Z" -> (known after apply)
        data_disk_size_gb      = 100
      + data_disk_type         = "PD_SSD" # forces replacement
      ~ id                     = "projects/[hidden]/locations/europe-west1-d/instances/trainer-notebook-cu101-2" -> (known after apply)
        install_gpu_driver     = true
      - labels                 = {
          - "goog-caip-notebook" = ""
        } -> null
        location               = "europe-west1-d"
        machine_type           = "n1-standard-8"
        name                   = "trainer-notebook-cu101-2"
      ~ network                = "https://www.googleapis.com/compute/v1/projects/[hidden]/global/networks/main-network" -> "projects/[hidden]/global/networks/main-network"
      - no_proxy_access        = false -> null
        no_public_ip           = true
      ~ project                = "[hidden]" -> (known after apply)
      ~ proxy_uri              = "[hidden]" -> (known after apply)
      ~ service_account        = "[hidden]" -> (known after apply)
      - service_account_scopes = [] -> null
      ~ state                  = "ACTIVE" -> (known after apply)
      ~ subnet                 = "https://www.googleapis.com/compute/v1/projects/[hidden]/regions/europe-west1/subnetworks/europe-west1" -> "projects/[hidden]/regions/europe-west1/subnetworks/europe-west1"
      - tags                   = [] -> null
      ~ update_time            = "2021-07-05T07:23:42.527571088Z" -> (known after apply)

        accelerator_config {
            core_count = 1
            type       = "NVIDIA_TESLA_P100"
        }

      ~ shielded_instance_config {
          ~ enable_integrity_monitoring = true -> (known after apply)
          ~ enable_secure_boot          = false -> (known after apply)
          ~ enable_vtpm                 = true -> (known after apply)
        }

      ~ vm_image {
            image_family = "common-cu101-notebooks-debian-9"
            project      = "deeplearning-platform-release"
        }
    }

Plan: 1 to add, 1 to destroy.

(I have [hidden] some sensitive content.)

Note the # forces replacement here next to the data_disk_type. This doesn't make sense, as this notebook was just created and I didn't change anything in the meantime to it.

When I remove the data_disk_type = "PD_SSD" from the Terraform config, it doesn't try to destroy the notebook anymore.


Version Info:

Terraform v0.13.7
+ provider registry.terraform.io/hashicorp/google v3.62.0

JeremyKeusters avatar Jul 05 '21 07:07 JeremyKeusters

May I know if this issue is resolved, our team is also facing same exact issue. when ever instance is updated, data disk is getting destroyed and re-created once again whereas when we update manually on GCP console without terraform, data disk is not deleted.

theorigin2030 avatar Jan 11 '22 13:01 theorigin2030

Hi, we are experiencing this issue too. Is this issue going to be fixed any time soon?

atemate avatar Jan 25 '22 18:01 atemate

experiencing the same issue running a pretty minimal example:

resource "google_notebooks_instance" "notebook" {
  name = "jupyter-instance"
  location = var.zone
  machine_type = "e2-medium"

  container_image {
    repository = "gcr.io/deeplearning-platform-release/base-cpu"
    tag = "latest"
  }

  # install_gpu_driver = true
  data_disk_type = "PD_SSD"
  data_disk_size_gb = 110
  no_remove_data_disk = true

  metadata = {
    terraform = "true"
  }
}

reports the same issue as other users

  # google_notebooks_instance.notebook must be replaced
-/+ resource "google_notebooks_instance" "notebook" {
      ~ create_time            = "2022-02-14T01:01:08.481085816Z" -> (known after apply)
      + data_disk_type         = "PD_SSD" # forces replacement
...

alienczf avatar Feb 14 '22 13:02 alienczf

Hi @rileykarson , will this issue be fixed ?

kirupa-ambiata avatar Jul 11 '22 04:07 kirupa-ambiata

I've unassigned myself as I'm not actively working on it- it's something we'd like to, but haven't found the cycles for.

rileykarson avatar Jul 12 '22 00:07 rileykarson

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions[bot] avatar Oct 08 '22 02:10 github-actions[bot]