tilt-extensions
tilt-extensions copied to clipboard
Tilt helm extension fails to cleanup resources and fails to restart with kyverno
Setup:
Kubernetes cluster: kind v0.20.0 go1.20.5 linux/amd64
Python version: Python 3.9.16
With this file as kyverno-test.tilt
load("ext://helm_resource", "helm_repo", "helm_resource")
load("ext://namespace", "namespace_create")
namespace_create("kyverno")
k8s_resource(
new_name = "kyverno-ns",
labels = ["kyverno"],
objects = ["kyverno:namespace"],
)
helm_repo(
name = "kyverno",
url = "https://kyverno.github.io/kyverno/",
labels = ["kyverno"],
)
helm_resource(
name = "kyverno-deployment",
chart = "kyverno/kyverno",
labels = ["kyverno"],
resource_deps = [
"kyverno",
"kyverno-ns",
],
namespace = "kyverno",
)
Repro steps I ran the following commands in this order
- kind delete cluster && kind create cluster
- tilt up -f kyverno-test.tilt
- Ctrl-c to exit tilt from the legacy terminal
- tilt down -f kyverno-test.tilt
- tilt up -f kyverno-test.tilt
On step 4 I'm seeing the following issue
Loading Tiltfile at: <path>/tilt/kyverno/kyverno-test.tilt
Successfully loaded Tiltfile (358.015303ms)
Not deleting namespaces: kyverno
Run with --delete-namespaces to delete namespaces as well.
Running cmd: python3 <path>/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-delete-helper.py
Running cmd: helm uninstall --namespace kyverno kyverno-deployment
Error: Deleting k8s entities for cmd: python3 <path>/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-delete-helper.py: exit status 137
And then on step 5 I see this
kyverno-depl… │
kyverno-depl… │ Initial Build
kyverno-depl… │ STEP 1/1 — Deploying
kyverno-depl… │ Running cmd: python3 <path>/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-apply-helper.py
kyverno-depl… │ Running cmd: ['helm', 'upgrade', '--install', '--namespace', 'kyverno', 'kyverno-deployment', 'kyverno/kyverno']
kyverno-depl… │ Error: UPGRADE FAILED: "kyverno-deployment" has no deployed releases
kyverno-depl… │ Traceback (most recent call last):
kyverno-depl… │ File "<path>/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-apply-helper.py", line 71, in <module>
kyverno-depl… │ subprocess.check_call(install_cmd, stdout=sys.stderr)
kyverno-depl… │ File "/opt/pyenv/versions/3.9.16/lib/python3.9/subprocess.py", line 373, in check_call
kyverno-depl… │ raise CalledProcessError(retcode, cmd)
kyverno-depl… │ subprocess.CalledProcessError: Command '['helm', 'upgrade', '--install', '--namespace', 'kyverno', 'kyverno-deployment', 'kyverno/kyverno']' returned non-zero exit status 1.
kyverno-depl… │
kyverno-depl… │ ERROR: Build Failed: apply command exited with status 1
Adding --delete-namespaces to the delete command - step 4, resulted in
% tilt down -f kyverno-test.tilt --delete-namespaces
Loading Tiltfile at: <path>/kyverno-test.tilt
Successfully loaded Tiltfile (357.493223ms)
Deleting kubernetes objects:
→ Namespace/kyverno
Running cmd: python3 <path>/.local/share/tilt-dev/tilt_modules/github.com/tilt-dev/tilt-extensions/helm_resource/helm-delete-helper.py
Running cmd: helm uninstall --namespace kyverno kyverno-deployment
Error: warning: Hook pre-delete kyverno/templates/hooks/pre-delete.yaml failed: 1 error occurred:
* jobs.batch "kyverno-deployment-hook-pre-delete" is forbidden: unable to create new content in namespace kyverno because it is being terminated
Expected outcomes I would like to understand why the helm-delete-helper failed to delete the kyverno resourcess, and if possible, can tilt handle these types of errors so that all resources created by a tilt up get cleaned up. Thank you!
Hmmm... What does helm version
show?
version.BuildInfo{Version:"v3.13.3", GitCommit:"c8b948945e52abba22ff885446a1486cb5fd3474", GitTreeState:"clean", GoVersion:"go1.21.5"}
(deleted a previous comment, i misunderstood something about the bug report)
hmmm...i can reproduce this in https://app.circleci.com/pipelines/github/tilt-dev/tilt-extensions/1504/workflows/1e98f416-7181-4bcc-b789-cf8d0b18f32c/jobs/3578, though not sure what's going on.
- exit status 137 usually means some sort of OOM killer
- my theory is that something is killing helm mid uninstall and leaving it in a bad state
ya, i think this is just tilt assuming the process got stuck and killing the process. we should have a more graceful shutdown, but for now, you can do something like:
update_settings(k8s_upsert_timeout_secs=120)
to give it a longer timeout.
Thanks Nick! Your workaround did the trick. Would you like me to keep this ticket open regarding having a more graceful shutdown? Or a more user-friendly timeout message?