terraform-provider-kubectl
terraform-provider-kubectl copied to clipboard
Terraform state is updated when update apply fails
When the provider fails while applying an update to a kubectl resource the change is still persisted in the terraform state.
Subsequent plans will then generate no changes
and the inconsistency remains silently present.
To mitigate, the administrator will need to manually identify all cases where the state has become out-of-sync and trigger a change, such as making a superflous change to the YAML definition such adding a tmp annotation in order to force the provider to update the resource.
Steps to reproduce:
Using Rancher Desktop as an example
With the definition;
terraform {
required_version = ">= 0.13"
required_providers {
kubectl = {
source = "gavinbunney/kubectl"
version = ">= 1.7.0"
}
}
}
provider "kubectl" {
host = "127.0.0.1:6443"
load_config_file = true
config_context = "rancher-desktop"
insecure = true
}
resource "kubectl_manifest" "test" {
yaml_body = <<YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
annotations:
tmp: one
spec:
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
YAML
}
- Run
terraform init
, then run and applyterraform apply
to create the resource - Update the
spec.replicas: 2
and the annotation totmp: two
- Run
terraform apply
to generate the plan but don't yet type 'yes' to run it - Simulate a network partition somehow. As I'm using a local cluster I will just shut it down
- Type 'yes' to apply the plan
- Observe that the apply fails - and that
terraform state pull
shows thatreplicas: 2
andtmp: two
was persisted to TF state - Resolve the network partition
- Observe that the replica count and annotation is still 1/one on the cluster
- Run
terraform plan
and observe that 'no changes' are requested by the provider
Workaround:
As a workaround - apply another change to yaml and apply it, then reverse it.
10. Change the replicas and annotation to 3/three
11. Run and apply terraform apply
12. Observe that the plan shows a diff from 2->3 rather than 1->3 which is what will actually be applied to the cluster
13. Change them back to 2 and apply - you are now at the desired state.
# kubectl_manifest.test will be updated in-place
~ resource "kubectl_manifest" "test" {
id = "/apis/apps/v1/namespaces/default/deployments/nginx-deployment"
name = "nginx-deployment"
~ yaml_body = (sensitive value)
~ yaml_body_parsed = <<-EOT
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
- tmp: two
+ tmp: three
name: nginx-deployment
spec:
- replicas: 2
+ replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx:1.14.2
name: nginx
ports:
- containerPort: 80
EOT
# (13 unchanged attributes hidden)
}
Shutting down the cluster is, of course, a contrived example. This bug was actually found on a real cluster because the CI workers K8s credentials expired due to a long task before the TF apply, causing an Unauthorized response from K8s.
I suspect this bug is related to the following line of documentation: https://developer.hashicorp.com/terraform/plugin/framework/diagnostics#how-errors-affect-state
How Errors Affect State Returning an error diagnostic does not stop the state from being updated. Terraform will still persist the returned state even when an error diagnostic is returned with it. This is to allow Terraform to persist the values that have already been modified when a resource modification requires multiple API requests or an API request fails after an earlier one succeeded.
When returning error diagnostics, we recommend resetting the state in the response to the prior state available in the configuration.