helm-charts
helm-charts copied to clipboard
[kube-prometheus-stack] deployment via argocd stuck in "pending deletion" for kube-prometheus-stack-admission-create job
Describe the bug a clear and concise description of what the bug is.
When deploying kube-prometheus-stack with argocd sometimes it gets stuck in "pending deletion" for object job kube-prometheus-stack-admission-create. We need to terminate the argocd sync and resync manually to get out of this situation, which is obviously not good in terms of full automation.
According to https://github.com/argoproj/argo-cd/issues/6880 this is a well known problem when a PreHook Job has a
ttlSecondsAfterFinished: 0
defined, because then kubernetes deletes this job immediatly after it finished and also argocd wants to delete this job and so we have some kind of race condition here.
I would propose to make the value for ttlSecondsAfterFinished in https://github.com/prometheus-community/helm-charts/blob/9c41858ac9714483638d78fb560577dc37e55875/charts/kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml#L19 configurable via a helm value.
What's your helm version?
3.14.3
What's your kubectl version?
v1.29.2
Which chart?
kube-prometheus-stack
What's the chart version?
58.2.2
What happened?
When deploying kube-prometheus-stack with argocd sometimes it gets stuck in "pending deletion" for object job kube-prometheus-stack-admission-create. We need to terminate the argocd sync and resync manually to get out of this situation, which is obviously not good in terms of full automation.
What you expected to happen?
the argocd sync of this chart works without failures are stuck problems
How to reproduce it?
setup an argocd application with
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kube-prometheus-stack-jokl
spec:
destination:
name: ''
namespace: jokl-test
server: 'https://kubernetes.default.svc'
source:
path: ''
repoURL: 'https://prometheus-community.github.io/helm-charts'
targetRevision: 58.2.2
chart: kube-prometheus-stack
sources: []
project: default
syncPolicy:
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
since it is a race condition it doesn't happen always, but very often
Enter the changed values of values.yaml?
NONE
Enter the command that you execute and failing/misfunctioning.
n.a.
Anything else we need to know?
No response
Right after creating this issue and creating a PR for this issue I recognized that ttlSecondsAfterFinished is not set at all because there was a "batch/v1alpha1" condition around this attribute, which is not met in current clusters
https://github.com/prometheus-community/helm-charts/blob/9c41858ac9714483638d78fb560577dc37e55875/charts/kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml#L17-L20
... so I need to investigate again .. I leave this issue open but please consider it as still under investigations
Funnily enough we never ran into this issue before this change, but now we need to set the ttlSecondsAfterFinished
because we consistently get this error 😂