Problem

We are recreating our cluster to enable the Private Node Pools, the issue seems to be that because we are recreating the cluster, the Kustomize Provider is trying to communicate with the Kubernetes Cluster on the default Localhost

Logs

Error: ResourceDiff: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused

  on .terraform/modules/gke_zero/common/cluster_services/main.tf line 16, in resource "kustomization_resource" "current":
  16: resource "kustomization_resource" "current" {


Error: Process completed with exit code 1.

Steps To Reproduce

Create a cluster with the setting:

enable_private_nodes = false

Then Once created change the value:

enable_private_nodes = true

and run on that TF workspace:

terraform plan

Workaround

Currently there is a workaround by using:

terraform apply --target=<cluster module>

This will update the cluster which should then fix the problem

Jan 13 '21 09:01 Spazzy757

This needs more investigation, but I've also seen this myself. The module receives the credentials the cluster resources output as an input. My preliminary investigation of this issue suggests that somehow when creating a plan to create cluster and cluster services, Terraform has the dependency graph correct. Also on destroy, the order seems correct, K8s resources first, then the cluster. But if the cluster gets destroyed and recreated, the graph does not first destroy the K8s resources, then destroy the cluster, then recreate the cluster and finally recreate the resources. That means the resources stay in the state but there are no cluster credentials to refresh them during plan.

Jan 18 '21 13:01 pst

To make it easier to understand, I created a simple config to reproduce the issue. https://github.com/pst/debugrecreateplan

The example repo shows the behaviour with both the official kubernetes provider as well as my kustomize provider on top of a KinD cluster. So it's also not Google provider specific.

And so far it seems to support my theory. Create and destroy plans correctly handle resources and clusters. But destroy and re-create plans do not handle the K8s resources on the cluster at all.

Create plan

[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.kustomization.current: Refreshing state...

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # kind_cluster.current will be created
  + resource "kind_cluster" "current" {
      + client_certificate     = (known after apply)
      + client_key             = (known after apply)
      + cluster_ca_certificate = (known after apply)
      + endpoint               = (known after apply)
      + id                     = (known after apply)
      + kubeconfig             = (known after apply)
      + kubeconfig_path        = (known after apply)
      + name                   = "debug-kind-kustomize"
      + node_image             = (known after apply)
      + wait_for_ready         = false

      + kind_config {
          + api_version = "kind.x-k8s.io/v1alpha4"
          + kind        = "Cluster"

          + node {
              + role = "control-plane"
            }
          + node {
              + role = "worker"
            }
        }
    }

  # kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be created
  + resource "kustomization_resource" "current" {
      + id       = (known after apply)
      + manifest = jsonencode(
            {
              + apiVersion = "v1"
              + kind       = "Namespace"
              + metadata   = {
                  + creationTimestamp = null
                  + name              = "debug"
                }
              + spec       = {}
              + status     = {}
            }
        )
    }

Plan: 2 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Destroy plan

[pst@pst-ryzen5 kind-kustomize]$ terraform plan --destroy
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # kind_cluster.current will be destroyed
  - resource "kind_cluster" "current" {
      - client_certificate     = <<~EOT
            ...
        EOT -> null
      - client_key             = <<~EOT
            ...
        EOT -> null
      - cluster_ca_certificate = <<~EOT
            ...
        EOT -> null
      - endpoint               = "https://127.0.0.1:44033" -> null
      - id                     = "debug-kind-kustomize-" -> null
      - kubeconfig             = <<~EOT
            ...
        EOT -> null
      - kubeconfig_path        = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> null
      - name                   = "debug-kind-kustomize" -> null
      - wait_for_ready         = false -> null

      - kind_config {
          - api_version               = "kind.x-k8s.io/v1alpha4" -> null
          - containerd_config_patches = [] -> null
          - kind                      = "Cluster" -> null

          - node {
              - kubeadm_config_patches = [] -> null
              - role                   = "control-plane" -> null
            }
          - node {
              - kubeadm_config_patches = [] -> null
              - role                   = "worker" -> null
            }
        }
    }

  # kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be destroyed
  - resource "kustomization_resource" "current" {
      - id       = "094e469a-08f9-47e4-a9f3-a39ae8268a89" -> null
      - manifest = jsonencode(
            {
              - apiVersion = "v1"
              - kind       = "Namespace"
              - metadata   = {
                  - creationTimestamp = null
                  - name              = "debug"
                }
              - spec       = {}
              - status     = {}
            }
        ) -> null
    }

Plan: 0 to add, 0 to change, 2 to destroy.

------------------------------------------------------------------------

Destroy & recreate plan

Triggered by changing node_count in main.tf. Does not include the K8s namespace.

[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # kind_cluster.current must be replaced
-/+ resource "kind_cluster" "current" {
      ~ client_certificate     = <<~EOT
            ...
        EOT -> (known after apply)
      ~ client_key             = <<~EOT
            ...
        EOT -> (known after apply)
      ~ cluster_ca_certificate = <<~EOT
            ...
        EOT -> (known after apply)
      ~ endpoint               = "https://127.0.0.1:44033" -> (known after apply)
      ~ id                     = "debug-kind-kustomize-" -> (known after apply)
      ~ kubeconfig             = <<~EOT
            ...
        EOT -> (known after apply)
      ~ kubeconfig_path        = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> (known after apply)
        name                   = "debug-kind-kustomize"
      + node_image             = (known after apply)
        wait_for_ready         = false

      ~ kind_config {
            api_version               = "kind.x-k8s.io/v1alpha4"
          - containerd_config_patches = [] -> null
            kind                      = "Cluster"

          ~ node { # forces replacement
              - kubeadm_config_patches = [] -> null
                role                   = "control-plane"
            }
          ~ node { # forces replacement
              - kubeadm_config_patches = [] -> null
                role                   = "worker"
            }
          + node { # forces replacement
              + role = "worker" # forces replacement
            }
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

------------------------------------------------------------------------

Jan 26 '21 16:01 pst

Likely related upstream issue: https://github.com/hashicorp/terraform/issues/22572

Jan 26 '21 17:01 pst

This seems to be fixed in the last couple of times I've used it, I'll do a test to make sure

May 17 '21 13:05 Spazzy757

This is definitly still an issue and it's not Kubestack specific, but generally an issue with Terraform.

I hope moving away from the in-module manifests and towards the new native modules may make the issue less frequent. But even then, e.g. the auth-configmap for EKS in the module may still cause this.

The only real workaround is a --target to deploy the changes to the cluster individually. Which is a bummer because this breaks automation. However, recreating the cluster is a disruptive change and should be rare for most teams.

May 17 '21 14:05 pst

Ah right, I recreated a GKE cluster that had a couple of manifests and it didn't break, was hoping that meant it was fixed

May 17 '21 15:05 Spazzy757

terraform-kubestack
terraform-kubestack copied to clipboard

Cluster Recreate Causes Kustomize Module To Fail

Problem

Logs

Steps To Reproduce

Workaround

terraform-kubestack terraform-kubestack copied to clipboard

Cluster Recreate Causes Kustomize Module To Fail

Problem

Logs

Steps To Reproduce

Workaround

terraform-kubestack
terraform-kubestack copied to clipboard