terraform-kubestack
terraform-kubestack copied to clipboard
Cluster Recreate Causes Kustomize Module To Fail
Problem
We are recreating our cluster to enable the Private Node Pools, the issue seems to be that because we are recreating the cluster, the Kustomize Provider is trying to communicate with the Kubernetes Cluster on the default Localhost
Logs
Error: ResourceDiff: Get "http://localhost/api?timeout=32s": dial tcp 127.0.0.1:80: connect: connection refused
on .terraform/modules/gke_zero/common/cluster_services/main.tf line 16, in resource "kustomization_resource" "current":
16: resource "kustomization_resource" "current" {
Error: Process completed with exit code 1.
Steps To Reproduce
Create a cluster with the setting:
enable_private_nodes = false
Then Once created change the value:
enable_private_nodes = true
and run on that TF workspace:
terraform plan
Workaround
Currently there is a workaround by using:
terraform apply --target=<cluster module>
This will update the cluster which should then fix the problem
This needs more investigation, but I've also seen this myself. The module receives the credentials the cluster resources output as an input. My preliminary investigation of this issue suggests that somehow when creating a plan to create cluster and cluster services, Terraform has the dependency graph correct. Also on destroy, the order seems correct, K8s resources first, then the cluster. But if the cluster gets destroyed and recreated, the graph does not first destroy the K8s resources, then destroy the cluster, then recreate the cluster and finally recreate the resources. That means the resources stay in the state but there are no cluster credentials to refresh them during plan.
To make it easier to understand, I created a simple config to reproduce the issue. https://github.com/pst/debugrecreateplan
The example repo shows the behaviour with both the official kubernetes provider as well as my kustomize provider on top of a KinD cluster. So it's also not Google provider specific.
And so far it seems to support my theory. Create and destroy plans correctly handle resources and clusters. But destroy and re-create plans do not handle the K8s resources on the cluster at all.
Create plan
[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
data.kustomization.current: Refreshing state...
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# kind_cluster.current will be created
+ resource "kind_cluster" "current" {
+ client_certificate = (known after apply)
+ client_key = (known after apply)
+ cluster_ca_certificate = (known after apply)
+ endpoint = (known after apply)
+ id = (known after apply)
+ kubeconfig = (known after apply)
+ kubeconfig_path = (known after apply)
+ name = "debug-kind-kustomize"
+ node_image = (known after apply)
+ wait_for_ready = false
+ kind_config {
+ api_version = "kind.x-k8s.io/v1alpha4"
+ kind = "Cluster"
+ node {
+ role = "control-plane"
}
+ node {
+ role = "worker"
}
}
}
# kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be created
+ resource "kustomization_resource" "current" {
+ id = (known after apply)
+ manifest = jsonencode(
{
+ apiVersion = "v1"
+ kind = "Namespace"
+ metadata = {
+ creationTimestamp = null
+ name = "debug"
}
+ spec = {}
+ status = {}
}
)
}
Plan: 2 to add, 0 to change, 0 to destroy.
------------------------------------------------------------------------
Destroy plan
[pst@pst-ryzen5 kind-kustomize]$ terraform plan --destroy
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
- destroy
Terraform will perform the following actions:
# kind_cluster.current will be destroyed
- resource "kind_cluster" "current" {
- client_certificate = <<~EOT
...
EOT -> null
- client_key = <<~EOT
...
EOT -> null
- cluster_ca_certificate = <<~EOT
...
EOT -> null
- endpoint = "https://127.0.0.1:44033" -> null
- id = "debug-kind-kustomize-" -> null
- kubeconfig = <<~EOT
...
EOT -> null
- kubeconfig_path = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> null
- name = "debug-kind-kustomize" -> null
- wait_for_ready = false -> null
- kind_config {
- api_version = "kind.x-k8s.io/v1alpha4" -> null
- containerd_config_patches = [] -> null
- kind = "Cluster" -> null
- node {
- kubeadm_config_patches = [] -> null
- role = "control-plane" -> null
}
- node {
- kubeadm_config_patches = [] -> null
- role = "worker" -> null
}
}
}
# kustomization_resource.current["~G_v1_Namespace|~X|debug"] will be destroyed
- resource "kustomization_resource" "current" {
- id = "094e469a-08f9-47e4-a9f3-a39ae8268a89" -> null
- manifest = jsonencode(
{
- apiVersion = "v1"
- kind = "Namespace"
- metadata = {
- creationTimestamp = null
- name = "debug"
}
- spec = {}
- status = {}
}
) -> null
}
Plan: 0 to add, 0 to change, 2 to destroy.
------------------------------------------------------------------------
Destroy & recreate plan
Triggered by changing node_count
in main.tf
. Does not include the K8s namespace.
[pst@pst-ryzen5 kind-kustomize]$ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
kind_cluster.current: Refreshing state... [id=debug-kind-kustomize-]
data.kustomization.current: Refreshing state... [id=5ffdb4bad7b4e2b4bd9a26a69a96e21e37a92301ca7108f731dc120dd806d5a2ec22feaaf104d9ad23dca0be7b50aaf0d0587f26a19df5dcd053d4eef745b704]
kustomization_resource.current["~G_v1_Namespace|~X|debug"]: Refreshing state... [id=094e469a-08f9-47e4-a9f3-a39ae8268a89]
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# kind_cluster.current must be replaced
-/+ resource "kind_cluster" "current" {
~ client_certificate = <<~EOT
...
EOT -> (known after apply)
~ client_key = <<~EOT
...
EOT -> (known after apply)
~ cluster_ca_certificate = <<~EOT
...
EOT -> (known after apply)
~ endpoint = "https://127.0.0.1:44033" -> (known after apply)
~ id = "debug-kind-kustomize-" -> (known after apply)
~ kubeconfig = <<~EOT
...
EOT -> (known after apply)
~ kubeconfig_path = "/home/pst/Code/pst/debugrecreateplan/kind-kustomize/debug-kind-kustomize-config" -> (known after apply)
name = "debug-kind-kustomize"
+ node_image = (known after apply)
wait_for_ready = false
~ kind_config {
api_version = "kind.x-k8s.io/v1alpha4"
- containerd_config_patches = [] -> null
kind = "Cluster"
~ node { # forces replacement
- kubeadm_config_patches = [] -> null
role = "control-plane"
}
~ node { # forces replacement
- kubeadm_config_patches = [] -> null
role = "worker"
}
+ node { # forces replacement
+ role = "worker" # forces replacement
}
}
}
Plan: 1 to add, 0 to change, 1 to destroy.
------------------------------------------------------------------------
Likely related upstream issue: https://github.com/hashicorp/terraform/issues/22572
This seems to be fixed in the last couple of times I've used it, I'll do a test to make sure
This is definitly still an issue and it's not Kubestack specific, but generally an issue with Terraform.
I hope moving away from the in-module manifests and towards the new native modules may make the issue less frequent. But even then, e.g. the auth-configmap for EKS in the module may still cause this.
The only real workaround is a --target
to deploy the changes to the cluster individually. Which is a bummer because this breaks automation. However, recreating the cluster is a disruptive change and should be rare for most teams.
Ah right, I recreated a GKE cluster that had a couple of manifests and it didn't break, was hoping that meant it was fixed