helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[kube-prometheus-stack] failure in helm deployment - failed calling webhook "prometheusrulemutate.monitoring.coreos.com"

Open thomas-vt opened this issue 2 years ago • 17 comments

Describe the bug a clear and concise description of what the bug is.

Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://kube-prometheus-stack-operator.infra.svc:443/admission-prometheusrules/mutate?timeout=10s": x509: certificate signed by unknown authority

What's your helm version?

v3.11.2

What's your kubectl version?

Client Version: v1.25.2

Which chart?

[kube-prometheus-stack

What's the chart version?

kube-prometheus-stack-45.7.1

What happened?

Error in installing the helm chart

Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://kube-prometheus-stack-operator.infra.svc:443/admission-prometheusrules/mutate?timeout=10s": x509: certificate signed by unknown authority

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack

Anything else we need to know?

Changing values.yaml and then install is successful

admissionWebhooks: enabled: false patch: enabled: false

thomas-vt avatar Mar 21 '23 22:03 thomas-vt

Setting

admissionWebhooks: failurePolicy: Ignore

seems to resolve the issue

thomas-vt avatar Mar 22 '23 20:03 thomas-vt

I have the same exact issue installing kube-prometheus-stack 45.7.1 on Kubernetes 1.25.5

Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://tnk-ea-prometheus-kube-pro-operator.prometheus.svc:443/admission-prometheusrules/mutate?timeout=10s": x509: certificate signed by unknown authority

n3wt0n avatar Mar 23 '23 09:03 n3wt0n

I am facing same issue while installing kube-prometheus-stack 45.7.1 on GKE 1.25

Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://monitoring-kube-prometheus-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": x509: certificate signed by unknown authority

narchova avatar Mar 23 '23 18:03 narchova

We are having the exact same issue using version 45.7.1. We are trying to install it on a completely fresh Kubernetes Cluster. We see the following error message:

Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": Post "[https://kube-prometheus-stack-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s](https://kube-prometheus-stack-operator.monitoring.svc/admission-prometheusrules/mutate?timeout=10s)": x509: certificate signed by unknown authority

erNail avatar Mar 30 '23 13:03 erNail

@erNail I used below instead of v1.5.1 for patch and it worked

    repository: ingress-nginx/kube-webhook-certgen
    tag: v1.3.0

narchova avatar Mar 30 '23 14:03 narchova

@narchova Where exactly did you change these values? I can't find anything about it in the kube-prometheus-stack Helm chart

erNail avatar Mar 30 '23 14:03 erNail

@erNail I n value.yaml file helm chart , you will see below

patch:
  enabled: true
  image:
    registry: registry.k8s.io
    repository: ingress-nginx/kube-webhook-certgen
    tag: v20221220-controller-v1.5.1-58-g787ea74b6
    sha: ""
    pullPolicy: IfNotPresent
  resources: {}

Use:

    repository: ingress-nginx/kube-webhook-certgen
    tag: v1.3.0

narchova avatar Mar 30 '23 14:03 narchova

@narchova Thank you for the suggestion, but we're still getting the same error message

erNail avatar Mar 30 '23 15:03 erNail

@erNail As a workaround, I used below in value.yaml file

admissionWebhooks: failurePolicy: Ignore

narchova avatar Mar 30 '23 15:03 narchova

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar May 01 '23 17:05 stale[bot]

Any update on this issue? Ignoring Admission Webhook failures looks like a very crude workaround?

tbe avatar May 02 '23 00:05 tbe

this is still relevant, I see on it on version 46.4.1 as well

ospiegel91 avatar Jun 04 '23 14:06 ospiegel91

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Aug 10 '23 02:08 stale[bot]

I guess this issue was resolved within this PR? At least in my Minikube environment the error does not occur anymore, but i have yet to try it in a "real" cluster.

erNail avatar Aug 12 '23 10:08 erNail

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Sep 17 '23 06:09 stale[bot]

Still relevant (kube-prometheus-stack 51.9.4)

hostalp avatar Oct 19 '23 22:10 hostalp

The problem is that if you are installing the chart with ArgoCD (no doing a helm install) it will always fail on the first itteration.

The root cause for this:

  • There's a pre-install hook that generates a secret with a self-signed certificate
  • On sync time, Mutating and Admission configuration objects are created, BUT WITHOUT CA added to them
  • Here's the problem, at this moment, if you try to deploy any PrometheusRule object it will not be allowed to be deployed because the MutatingWebhook will fail with this certificate issue, because they don't have the ca bundle added yet, it happens on a post-sync hook
  • The job that patches the Mutating/Admission webhooks runs on a post-install hook, so if this PrometheusRules objects never get deployed successfully, it's never executed.

A workaround/solution (works nicely with ArgoCD, if you don't use ArgoCD add helm hook annotations instead):

Change the patch job to run on Sync phase:


prometheusOperator:
    admissionWebhooks:
      patch:
        annotations:
          argocd.argoproj.io/hook: Sync  ## Needed, otherwise the mutation/admission webhooks will never get patched on a fresh installation
                                         ## With this configuration, the patch job will run at Sync time (after the pre-hooks)

druanoor avatar Jan 19 '24 10:01 druanoor