terraform-provider-kubernetes icon indicating copy to clipboard operation
terraform-provider-kubernetes copied to clipboard

Fix kubernetes_daemon_set not waiting for rollout. Fixes #2092

Open sbocinec opened this issue 1 year ago • 3 comments

Description

Fixes kubernetes_daemon_set_v1 resource to wait for rollout. Current check for the rollout when wait_for_rollout = true is ineffective and is not waiting for rollout at all as it checks a wrong status field of a DaemonSet.

:exclamation: What is worse, there is currently no way how to correctly manage a DaemonSet using kubernetes provider as also kubernetes_manifest is affected by a nasty bug so every DaemonSet change ends up with a failure..

Impacted by the issue, I prepared a reproducer TF code here. Troubleshooting the issue by applying a new resource and changing existing one I noticed, the current code is checking a wrong field, so wait_for_rollout = true has no effect as incorrect status fiels is check.

Additionally, majority of the daemonset resource tests must have been fixed as after fixing the rollout issues, these have been failing as the the original manifests failed to be rolled out.

:warning: this fix might break many existing DaemonSets as by default wait_for_rollout is set to true in the resource. Until now, waiting for rollout was ineffective, though after it's fixed, apply is going to wait until all the daemonset pods are in Ready state. I expect this might cause some issues for users using this resource as their existing TF code might start to time out on apply. This is also visible on the number of test case changes, that must have been updated.

Troubleshooting - apply of a new resource

Plan: 1 to add, 0 to change, 0 to destroy.                                                                                                                                                                                                                                                                                                                              

kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Creating...                                                                                                                   
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Creation complete after 2s [id=default/i-am-not-waiting-for-rollout]                                                          
                                                                                          
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.      

The apply is almost instant. While the apply is running,, this is how the daemonset status looks like: `

$ while true; do kubectl get ds/i-am-not-waiting-for-rollout --output="jsonpath={.status}"  ;echo;  done` printing the DS Error from server (NotFound): daemonsets.apps "i-am-not-waiting-for-rollout" not found
...                                                                                          
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberMisscheduled":0,"numberReady":0,"numberUnavailable":2,"observedGeneration":1,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberMisscheduled":0,"numberReady":0,"numberUnavailable":2,"observedGeneration":1,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":1,"numberMisscheduled":0,"numberReady":1,"numberUnavailable":1,"observedGeneration":1,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":2,"numberMisscheduled":0,"numberReady":2,"observedGeneration":1,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":2,"numberMisscheduled":0,"numberReady":2,"observedGeneration":1,"updatedNumberScheduled":2}

As you can see, checking CurrentNumberScheduled is not the right property to check if when we need to wait for the rollout as this number does not represent the DaemonSet rollout status. We must check for numberReady here.

Troubleshooting - change of an existing resource

Apply of a change:

Plan: 0 to add, 1 to change, 0 to destroy.                                                                                                                                           
                                                                                         
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Modifying... [id=default/i-am-not-waiting-for-rollout]                                                                        
kubernetes_daemon_set_v1.i-am-not-waiting-for-rollout: Modifications complete after 2s [id=default/i-am-not-waiting-for-rollout]                                                                                                                                                                                                                                           
                                                                                          
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.                                                                          

DS status output:

{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":2,"numberMisscheduled":0,"numberReady":2,"observedGeneration":5,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":1,"numberMisscheduled":0,"numberReady":1,"numberUnavailable":1,"observedGeneration":6,"updatedNumberScheduled":1}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":1,"numberMisscheduled":0,"numberReady":1,"numberUnavailable":1,"observedGeneration":6,"updatedNumberScheduled":1}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":1,"numberMisscheduled":0,"numberReady":1,"numberUnavailable":1,"observedGeneration":6,"updatedNumberScheduled":1}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":1,"numberMisscheduled":0,"numberReady":1,"numberUnavailable":1,"observedGeneration":6,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":1,"numberMisscheduled":0,"numberReady":1,"numberUnavailable":1,"observedGeneration":6,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":2,"numberMisscheduled":0,"numberReady":2,"observedGeneration":6,"updatedNumberScheduled":2}
{"currentNumberScheduled":2,"desiredNumberScheduled":2,"numberAvailable":2,"numberMisscheduled":0,"numberReady":2,"observedGeneration":6,"updatedNumberScheduled":2}

Acceptance tests

  • [ ] Have you added an acceptance test for the functionality being added?
  • [x] Have you run the acceptance tests on this branch?

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccKubernetesDaemonSetV1* -v' 
==> Checking that code complies with gofmt requirements...
go vet ./...
TF_ACC=1 go test "/home/stano/workspace/projects/tf/terraform-provider-kubernetes/kubernetes" -v -vet=off -run=TestAccKubernetesDaemonSetV1* -v -parallel 8 -timeout 3h
=== RUN   TestAccKubernetesDaemonSetV1_minimal
=== PAUSE TestAccKubernetesDaemonSetV1_minimal
=== RUN   TestAccKubernetesDaemonSetV1_basic
=== PAUSE TestAccKubernetesDaemonSetV1_basic
=== RUN   TestAccKubernetesDaemonSetV1_with_template_metadata
=== PAUSE TestAccKubernetesDaemonSetV1_with_template_metadata
=== RUN   TestAccKubernetesDaemonSetV1_initContainer
=== PAUSE TestAccKubernetesDaemonSetV1_initContainer
=== RUN   TestAccKubernetesDaemonSetV1_noTopLevelLabels
=== PAUSE TestAccKubernetesDaemonSetV1_noTopLevelLabels
=== RUN   TestAccKubernetesDaemonSetV1_with_tolerations
=== PAUSE TestAccKubernetesDaemonSetV1_with_tolerations
=== RUN   TestAccKubernetesDaemonSetV1_with_tolerations_unset_toleration_seconds
=== PAUSE TestAccKubernetesDaemonSetV1_with_tolerations_unset_toleration_seconds
=== RUN   TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_profile
=== PAUSE TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_profile
=== RUN   TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_localhost_profile
=== PAUSE TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_localhost_profile
=== RUN   TestAccKubernetesDaemonSetV1_with_resource_requirements
=== PAUSE TestAccKubernetesDaemonSetV1_with_resource_requirements
=== RUN   TestAccKubernetesDaemonSetV1_minimalWithTemplateNamespace
=== PAUSE TestAccKubernetesDaemonSetV1_minimalWithTemplateNamespace
=== CONT  TestAccKubernetesDaemonSetV1_minimal
=== CONT  TestAccKubernetesDaemonSetV1_with_tolerations_unset_toleration_seconds
=== CONT  TestAccKubernetesDaemonSetV1_noTopLevelLabels
=== CONT  TestAccKubernetesDaemonSetV1_with_template_metadata
=== CONT  TestAccKubernetesDaemonSetV1_with_resource_requirements
=== CONT  TestAccKubernetesDaemonSetV1_minimalWithTemplateNamespace
=== CONT  TestAccKubernetesDaemonSetV1_initContainer
=== CONT  TestAccKubernetesDaemonSetV1_with_tolerations
--- PASS: TestAccKubernetesDaemonSetV1_with_tolerations_unset_toleration_seconds (18.46s)
=== CONT  TestAccKubernetesDaemonSetV1_basic
--- PASS: TestAccKubernetesDaemonSetV1_minimal (18.49s)
=== CONT  TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_localhost_profile
--- PASS: TestAccKubernetesDaemonSetV1_noTopLevelLabels (18.54s)
=== CONT  TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_profile
--- PASS: TestAccKubernetesDaemonSetV1_with_tolerations (19.22s)
--- PASS: TestAccKubernetesDaemonSetV1_initContainer (22.81s)
--- PASS: TestAccKubernetesDaemonSetV1_with_template_metadata (31.53s)
--- PASS: TestAccKubernetesDaemonSetV1_minimalWithTemplateNamespace (32.90s)
--- PASS: TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_localhost_profile (14.42s)
--- PASS: TestAccKubernetesDaemonSetV1_with_container_security_context_seccomp_profile (23.03s)
--- PASS: TestAccKubernetesDaemonSetV1_with_resource_requirements (46.52s)
--- PASS: TestAccKubernetesDaemonSetV1_basic (28.36s)
PASS
ok  	github.com/hashicorp/terraform-provider-kubernetes/kubernetes	46.870s

Release Note

Release note for CHANGELOG:

`resource/kubernetes_daemon_set_v1`: fix an issue with the provider not waiting for rollout even `wait_for_rollout` is set to `true`

References

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

sbocinec avatar Feb 09 '24 13:02 sbocinec