stackstorm-k8s
stackstorm-k8s copied to clipboard
Some K8S resources do not get deleted after 'helm delete'.
Hello,
I would like to report an issue (I believe). After trying to delete the deployed helm chart, I can see that there are job pods remained. Here is an example of kubectl get po
after helm delete --purge stackstorm
on Helm 2 or helm delete stackstorm
on Helm 3:
❯ kubectl get po
NAME READY STATUS RESTARTS AGE
stackstorm-job-st2-apikey-load-j8rkv 0/1 Completed 0 6d21h
stackstorm-job-st2-key-load-wbdxp 0/1 Completed 0 6d21h
stackstorm-job-st2-register-content-l6rmh 0/1 Completed 0 6d21h
Not only these pods, but also the jobs remain as well. on kubectl get jobs
, I can see:
❯ kubectl get job
NAME COMPLETIONS DURATION AGE
stackstorm-job-st2-apikey-load 1/1 45s 6d21h
stackstorm-job-st2-key-load 1/1 9s 6d21h
stackstorm-job-st2-register-content 1/1 25s 6d21h
There are also PV,PVC objects remained but I believe that is desired?
Thanks!
Edit: I believe that the reason is related to this. I think both jobs and pods should be deleted on helm delete
and also, at least, pods of jobs should be deleted after they have been successful.
Yes, that's caused by the fact that hooks are not managed by Helm and there is no garbage collector for them yet: https://helm.sh/docs/topics/charts_hooks/#hook-resources-are-not-managed-with-corresponding-releases
The only workaround is adding "helm.sh/hook-delete-policy": hook-succeeded
annotation, https://helm.sh/docs/topics/charts_hooks/#hook-deletion-policies which is not desired. Instead of deleting successful job immediately, we want to keep it for the informational reasons so the user could grab the logs and know which content was registered, otherwise this Helm magic would remain unnoticed. Sadly, Helm doesn't delete jobs once the release was removed.
It's actually a good topic to discuss and I'm inviting others to provide their feedback.
Depending on feedback it might be a good idea to change the behavior until Chart reaches stable
state.
Alternative per Helm doc advice is trying ttlSecondsAfterFinished
(https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically) and TTLController
https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/ to clean-up the job automatically after some reasonable delay.
However this feature looks like still in alpha
state.
Alternative per Helm doc advice is trying
ttlSecondsAfterFinished
(https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically) andTTLController
https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/ to clean-up the job automatically after some reasonable delay. However this feature looks like still inalpha
state.
It looks like this hit beta
state in k8s v1.21 - so it's probably safe to start adding something like this.
I'm guessing the ttl would need to be defined in values.yaml (perhaps in jobs.ttlSecondsAfterFinished
), probably with a large default like 604800
(1 week).
Quick follow-up if anyone wants to work on this. The TTL-after-finished Controller
feature hit stable
in k8s v1.23.
NOTE: This feature will only clean up the old jobs. The only other hook we have is used for running tests though (a Pod
), so it is unlikely to be (and probably should not be) in any production clusters. That test Pod is only deleted automatically if it succeeds - if there's a failure, it has to be cleaned up manually (or just use a disposable cluster for testing).
k8s 1.22 was EOL on 2022-10-28, so we can safely use ttlSecondsAfterFinished
now. A PR to implement this would be welcome!