helm-charts [kube-prometheus-stack] Helm-upgrade fails due to admission-webhooks being unable to reach service

Describe the bug It seems like admission-webhooks arent patched to ignore failures during Helm-upgrades. This causes admission-webhooks to fail while the pod is being patched in single-replica deployments.

Version of Helm and Kubernetes: Helm Version:

$ helm version
version.BuildInfo{Version:"v3.5.4", GitCommit:"1b5edb69df3d3a08df77c9902dc17af864ff05d1", GitTreeState:"clean", GoVersion:"go1.15.11"}

Kubernetes Version:

$ kubectl version
v1.19.9-gke.1900

Which chart: kube-prometheus-stack

Which version of the chart: 16.1.2

What happened:

Error: UPGRADE FAILED: cannot patch "kube-prometheus-kube-prome-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-k8s.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-apiserver-availability.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-apiserver-slos" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-apiserver.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-prometheus-general.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-prometheus-node-recording.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-scheduler.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kube-state-metrics" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubelet.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-apps" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-resources" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-storage" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-apiserver" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-controller-manager" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-kubelet" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system-scheduler" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-kubernetes-system" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node-exporter.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node-exporter" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node-network" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-node.rules" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-prometheus-operator" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator" && cannot patch "kube-prometheus-kube-prome-prometheus" with kind PrometheusRule: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "https://kube-prometheus-kube-prome-operator.test-monitoring.svc:443/admission-prometheusrules/validate?timeout=10s": no service port 443 found for service "kube-prometheus-kube-prome-operator"

What you expected to happen:

Helm-upgrade to be completed successfully.

How to reproduce it (as minimally and precisely as possible):

Upgrade Helm-chart, seems to occur randomly. Sometimes the upgrade works.

Changed values of values.yaml (only put values which differ from the defaults):

values.yaml

defaultRules:
  rules:
    alertmanager: false
    etcd: false

alertmanager:
  enabled: false

coreDns:
  enabled: false

kubeDns:
  enabled: false

kubeEtcd:
  enabled: false

prometheusOperator:
  tls:
    enabled: false

kubeStateMetrics:
  enabled: false

nodeExporter:
  enabled: false

prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false

The helm command that you execute and failing/misfunctioning:

For example:

helm upgrade --install my-release prometheus-community/kube-prometheus-stack --version 16.1.2 --values values.yaml

Helm values set after installation/upgrade:

alertmanager:
  enabled: false
coreDns:
  enabled: false
defaultRules:
  rules:
    alertmanager: false
    etcd: false
kubeDns:
  enabled: false
kubeEtcd:
  enabled: false
kubeStateMetrics:
  enabled: false
nodeExporter:
  enabled: false
prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
prometheusOperator:
  tls:
    enabled: false

Anything else we need to know:

Something to keep in mind is that the Prometheus cluster is single-replica, so there's no highly available upgrades.

We currently use this patch-job to setup certs/failure-policy post install: https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-patchWebhook.yaml
This same job can be used to setup just failure-policy (to Ignore) pre-install.

Currently using a temporary workaround for disabling admission-webhook failures before install which resolved the issue:

kubectl get ValidatingWebhookConfiguration -o yaml \
  | sed "s/failurePolicy: Fail/failurePolicy: Ignore/" \
  | kubectl apply -f -
kubectl get MutatingWebhookConfiguration -o yaml \
  | sed "s/failurePolicy: Fail/failurePolicy: Ignore/" \
  | kubectl apply -f -

But a more permanent solution within chart would be great. I dont mind creating a PR for this if the community agrees this to be a viable solution.

Jun 04 '21 14:06 Jaskaranbir

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Jul 06 '21 09:07 stale[bot]

We are facing the same problem. K8s v1.17 Chart: 16.10.0

Jul 09 '21 13:07 tlperini

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Aug 08 '21 13:08 stale[bot]

having the same problem here.

k8s v.1.21.2 Chart: 18.0.0

Aug 20 '21 11:08 rafaribe

same problem k8s v1.22.1 Chart: 18.0.3 - seems like this started sometime in v17

Sep 03 '21 11:09 ruckc

Same problem here.

k8s v1.21.4 Chart: 18.0.3

Sep 05 '21 05:09 timbrd

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Oct 05 '21 15:10 stale[bot]

+1

Oct 18 '21 00:10 AndrewSav

I also have this problem

k8s: v1.18.8 helm: v3.6.3

Oct 26 '21 13:10 TheGthr

+1

helm: 3.7.1 k8s server: 1.20.7 kubectl: 1.22.3

Nov 17 '21 20:11 ghost

I regularly see this as well

Chart/app: kube-prometheus-stack-18.1.0/0.50.0 Helm: v3.5.4 K8S: 1.18 kubectl: v1.20.2

Nov 29 '21 17:11 linxcat

+1

helm: v3.5.4 k8s: v1.21.2 kubectl: v1.21.0

Dec 01 '21 15:12 yum-dev

Any update here. I am also getting same error.

Dec 09 '21 07:12 infa-mlagad

Please check, if this helps. https://github.com/prometheus-community/helm-charts/issues/108

Dec 09 '21 10:12 infa-mlagad

same issue here 👍

k8S version : Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:48:33Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:42:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

Dec 10 '21 20:12 SelimBF

some on our side

eksctl: v1.22.4
k8s: v1.21.2
helm: v3.7.2"

Dec 16 '21 15:12 vadlungu

@vadlungu if you are trying to install promotheus visit my github repo you find the solution

Dec 16 '21 20:12 SelimBF

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Jan 16 '22 00:01 stale[bot]

/remove stale

Jan 16 '22 04:01 AndrewSav

same issue k8s: 1.23 helm: 3.8.0

Jan 28 '22 14:01 yozhsh

Hi, thanks @Jaskaranbir for the provided workaround 👍.

For those who want to execute the associated commands please be aware that they act on all validatingwebhookconfiguration and mutatingwebhookconfiguration resources which is probably not what you want in most cases (especially if you have a huge Kubernetes cluster).

Pick only the resources associated to the kube-prometheus-stack instead as explained in comment https://github.com/prometheus-community/helm-charts/issues/108#issuecomment-825689328

In my case I executed the following commands.

# Check existing objects
kubectl get validatingwebhookconfiguration -A
kubectl get mutatingwebhookconfiguration -A

# Ignore failures
kubectl get validatingwebhookconfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND -o yaml | sed "s/failurePolicy: Fail/failurePolicy: Ignore/" | kubectl apply -f -
kubectl get mutatingwebhookwonfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND  -o yaml | sed "s/failurePolicy: Fail/failurePolicy: Ignore/" | kubectl apply -f -

# Perform Helm changes ...

# Revert policy changes
kubectl get validatingwebhookconfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND -o yaml | sed "s/failurePolicy: Ignore/failurePolicy: Fail/" | kubectl apply -f -
kubectl get mutatingwebhookwonfiguration NAME_RETURNED_IN_PREVIOUS_COMMAND  -o yaml | sed "s/failurePolicy: Ignore/failurePolicy: Fail/" | kubectl apply -f -

Feb 07 '22 18:02 bgaillard

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Mar 19 '22 22:03 stale[bot]

/remove stale

Mar 20 '22 00:03 AndrewSav

We are facing the same issue, frequently having to ignore the webhooks to proceed with small upgrades

kube-prometheus-stack 34.5.1
helm 3.8.0
kubernetes v1.22.5

Mar 30 '22 10:03 MisterTimn

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Apr 29 '22 22:04 stale[bot]

/remove stale

Apr 30 '22 12:04 AndrewSav

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

May 31 '22 02:05 stale[bot]

/remove stale

May 31 '22 04:05 AndrewSav

I havefaced the same probelm using after deleting all grafana manifest and redeploy again using:

Helm install garafana grafana/grafana --set persistance.enabled=True

May 31 '22 09:05 SelimBF

You have to consider that "helm update install ...."doesnt work in my case and the pod crashed by the way

May 31 '22 09:05 SelimBF

helm-charts helm-charts copied to clipboard

[kube-prometheus-stack] Helm-upgrade fails due to admission-webhooks being unable to reach service

helm-charts
helm-charts copied to clipboard