zarf
zarf copied to clipboard
post-install hooks do not get ran until all chart resources are ready/healthy
Environment
Device and OS: All App version: 0.21.3 Kubernetes distro being used: k3d/k3s Other:
Steps to reproduce
- Get a helm chart that utilizes a post-install hook for a resource
- Modify helm chart that a pod doesn't become healthy until that post-install hook is ran
- Test using just helm - post-install hook runs once all resources are created and all pods end up healthy
- Create zarf package using modified helm chart
- Deploy package
Expected result
Post-install hook resources are created once all other resources have been created, pod that depends on post-install resource becomes healthy.
Actual Result
Post-install hook never runs so the zarf package never succeeds.
Severity/Priority
Medium - Blocking but has a workaround.
Additional Context
This can be worked around by adding a dataInjection since zarf doesn't wait for resources to be ready before running post-install hook resources.
That's a clever workaround (and fairly new 😂). I actually think we should just expose this as an option in the zarf.yaml for when it's needed. Thoughts @defenseunicorns/zarf?
It would be a relatively simple change in src/internal/helm/chart.go
Is there a reason we don't flip the behavior though? The default in helm is opposite the default in Zarf currently (which may cause issues with folks trying something in Zarf and Helm and wondering why one doesn't work and the other does as in this case).
what line are you referring to?
If you're referring to our use of --wait
, that's because the expectation in how all zarf components works is that the next component won't run until the previous is ready. Changing that default behavior would kind of break * everything *. That doesn't matter for helm where it doesn't know about multiple helm charts chained together, it does matter for zarf. Think the easy answer is to just expose the option in zarf.yaml
. This is actually just another poor helm design from older versions that seems to be carried over: https://github.com/helm/helm/issues/5118.
Shouldn't the use of --wait
in zarf still allow the post-install hook to run?
The library loads the resulting resources into Kubernetes. Note that if the --wait flag is set, the library will wait until all resources are in a ready state and will not run the post-install hook until they are ready.
Not according to the linked issue above. And that's all we are doing is setting wait: true, this is helm internal behavior.
This should be investigated further, I created/deployed this package: (helm chart created to try and reproduce the issue)
kind: ZarfPackageConfig
metadata:
name: "init-package-test"
components:
- name: test
description: "testing hooks"
images:
- docker.io/nginx:1.16.0
- docker.io/busybox:latest
charts:
- name: hookchart
url: https://github.com/brandtkeller/testchart.git
version: 0.1.0
namespace: test
Zarf properly waits for both the deployment pod to be ready AND the completion of the post-install hook job.
? Deploy the test component? Yes
📦 TEST COMPONENT
✔ Loading the Zarf State from the Kubernetes cluster
✔ Creating port forwarding tunnel at http://127.0.0.1:59316
✔ Storing images in the zarf registry
✔ Processing helm chart hookchart:0.1.0 from https://github.com/brandtkeller/testchart.git
✔ Zarf deployment complete
output when running with a post-install job (before it finishes):
✔ Loading the Zarf State from the Kubernetes cluster
✔ Creating port forwarding tunnel at http://127.0.0.1:60538
✔ Storing images in the zarf registry
⠸ testjob: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 (19s)
If you're referring to our use of
--wait
, that's because the expectation in how all zarf components works is that the next component won't run until the previous is ready. Changing that default behavior would kind of break * everything *. That doesn't matter for helm where it doesn't know about multiple helm charts chained together, it does matter for zarf. Think the easy answer is to just expose the option inzarf.yaml
. This is actually just another poor helm design from older versions that seems to be carried over: helm/helm#5118.
We could add another mechanism to Zarf to wait for the chart between components in Zarf-land instead of helm itself without breaking everything no?
No I don't think so, it's a really non-trivial problem. I spent weeks debugging/analyzing various deployment paradigms and helm's was the most stable for ensuring reconciliation/completion prior to moving on. Helm hooks are gross and what you use when you don't have smarter orchestration such as flux/argo/fleet/knative or some operator framework. Changing this would be a very significant change to Zarf that I am not comfortable for what is an edge case we haven't seen before.
I do support exposing this control to the user though as that seems to easily meet the needs here without changing behavior for every system. Keep in mind even raw manifests/kustomizations are actually turned into helm charts for apply.
Separately, I also see value in more advanced wait logic independent of Helm for things that say are controlled by flux & friends. Something like a wait condition in a component to wait for some arbitrary K8s resource to exist or be ready before proceeding. e.g. "wait for deployment xx to be healthy" or "wait for crd xx to exist". However, I see that as an advanced feature that is separate from this particular concern.
Going to try to implement another field into the zarf.yaml to conditionalize the wait flag to helm