helm-charts
helm-charts copied to clipboard
[kube-prometheus-stack] Helm-upgrade fails due to admission-webhooks being unable to reach service
Describe the bug It seems like admission-webhooks arent patched to ignore failures during Helm-upgrades. This causes admission-webhooks to fail while the pod is being patched in single-replica deployments.
Version of Helm and Kubernetes: Helm Version:
$ helm version
version.BuildInfo{Version:"v3.5.4", GitCommit:"1b5edb69df3d3a08df77c9902dc17af864ff05d1", GitTreeState:"clean", GoVersion:"go1.15.11"}
Kubernetes Version:
$ kubectl version
v1.19.9-gke.1900
Which chart: kube-prometheus-stack
Which version of the chart: 16.1.2
What happened:
Error: UPGRADE FAILED: cannot patch "kube-prometheus-kube-prome-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-k8s.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-apiserver-availability.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-apiserver-slos" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-apiserver.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-prometheus-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-prometheus-node-recording.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-scheduler.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-state-metrics" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubelet.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-apps" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-resources" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-storage" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-apiserver" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-controller-manager" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-kubelet" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-scheduler" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node-exporter.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node-exporter" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node-network" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-prometheus-operator" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-prometheus" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator"
What you expected to happen:
Helm-upgrade to be completed successfully.
How to reproduce it (as minimally and precisely as possible):
Upgrade Helm-chart, seems to occur randomly. Sometimes the upgrade works.
Changed values of values.yaml (only put values which differ from the defaults):
values.yaml
defaultRules:
rules:
alertmanager: false
etcd: false
alertmanager:
enabled: false
coreDns:
enabled: false
kubeDns:
enabled: false
kubeEtcd:
enabled: false
prometheusOperator:
tls:
enabled: false
kubeStateMetrics:
enabled: false
nodeExporter:
enabled: false
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
The helm command that you execute and failing/misfunctioning:
For example:
helm upgrade --install my-release prometheus-community/kube-prometheus-stack --version 16.1.2 --values values.yaml
Helm values set after installation/upgrade:
alertmanager:
enabled: false
coreDns:
enabled: false
defaultRules:
rules:
alertmanager: false
etcd: false
kubeDns:
enabled: false
kubeEtcd:
enabled: false
kubeStateMetrics:
enabled: false
nodeExporter:
enabled: false
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
prometheusOperator:
tls:
enabled: false
Anything else we need to know:
Something to keep in mind is that the Prometheus cluster is single-replica, so there's no highly available upgrades.
We currently use this patch-job to setup certs/failure-policy post install: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-patchWebhook.yaml
This same job can be used to setup just failure-policy (to Ignore
) pre-install.
Currently using a temporary workaround for disabling admission-webhook failures before install which resolved the issue:
kubectl get ValidatingWebhookConfiguration -o yaml \
| sed "s/failurePolicy: Fail/failurePolicy: Ignore/" \
| kubectl apply -f -
kubectl get MutatingWebhookConfiguration -o yaml \
| sed "s/failurePolicy: Fail/failurePolicy: Ignore/" \
| kubectl apply -f -
But a more permanent solution within chart would be great. I dont mind creating a PR for this if the community agrees this to be a viable solution.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
We are facing the same problem. K8s v1.17 Chart: 16.10.0
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
having the same problem here.
k8s v.1.21.2 Chart: 18.0.0
same problem k8s v1.22.1 Chart: 18.0.3 - seems like this started sometime in v17
Same problem here.
k8s v1.21.4 Chart: 18.0.3
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
+1
I also have this problem
k8s: v1.18.8 helm: v3.6.3
+1
helm: 3.7.1 k8s server: 1.20.7 kubectl: 1.22.3
I regularly see this as well
Chart/app: kube-prometheus-stack-18.1.0/0.50.0 Helm: v3.5.4 K8S: 1.18 kubectl: v1.20.2
+1
helm: v3.5.4 k8s: v1.21.2 kubectl: v1.21.0
Any update here. I am also getting same error.
Please check, if this helps. https://github.com/prometheus-community/helm-charts/issues/108
same issue here 👍
k8S version : Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:42:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}
some on our side
eksctl: v1.22.4
k8s: v1.21.2
helm: v3.7.2"
@vadlungu if you are trying to install promotheus visit my github repo you find the solution
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
/remove stale
same issue k8s: 1.23 helm: 3.8.0
Hi, thanks @Jaskaranbir for the provided workaround 👍.
For those who want to execute the associated commands please be aware that they act on all validatingwebhookconfiguration
and mutatingwebhookconfiguration
resources which is probably not what you want in most cases (especially if you have a huge Kubernetes cluster).
Pick only the resources associated to the kube-prometheus-stack
instead as explained in comment https://github.com/prometheus-community/helm-charts/issues/108#issuecomment-825689328
In my case I executed the following commands.
# Check existing objects
kubectl get validatingwebhookconfiguration -A
kubectl get mutatingwebhookconfiguration -A
# Ignore failures
kubectl get validatingwebhookconfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND -o yaml | sed "s/failurePolicy: Fail/failurePolicy: Ignore/" | kubectl apply -f -
kubectl get mutatingwebhookwonfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND -o yaml | sed "s/failurePolicy: Fail/failurePolicy: Ignore/" | kubectl apply -f -
# Perform Helm changes ...
# Revert policy changes
kubectl get validatingwebhookconfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND -o yaml | sed "s/failurePolicy: Ignore/failurePolicy: Fail/" | kubectl apply -f -
kubectl get mutatingwebhookwonfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND -o yaml | sed "s/failurePolicy: Ignore/failurePolicy: Fail/" | kubectl apply -f -
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
/remove stale
We are facing the same issue, frequently having to ignore the webhooks to proceed with small upgrades
- kube-prometheus-stack 34.5.1
- helm 3.8.0
- kubernetes v1.22.5
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
/remove stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
/remove stale
I havefaced the same probelm using after deleting all grafana manifest and redeploy again using:
Helm install garafana grafana/grafana --set persistance.enabled=True
You have to consider that "helm update install ...."doesnt work in my case and the pod crashed by the way