helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[kube-prometheus-stack] Prometheus Operator pod cannot come up when admission hook is disabled

Open AndrewSav opened this issue 2 years ago • 20 comments

Describe the bug a clear and concise description of what the bug is.

Prometheus Operator pod cannot come up with a missing admission hook secret error if admission hook is disabled.

What's your helm version?

v3.7.0

What's your kubectl version?

v1.22.2

Which chart?

kube-prometheus-stack

What's the chart version?

19.0.2

What happened?

The operator pod cannot come up, with the following error message: MountVolume.SetUp failed for volume "tls-secret" : secret "prometheus-kube-prometheus-admission" not found. This message is displayed because admission hook is disabled and the secret is not present.

What you expected to happen?

I expect the operator to come up.

How to reproduce it?

Install the chart with the values below

Enter the changed values of values.yaml?

prometheusOperator:
  admissionWebhooks:
    enabled: false

Enter the command that you execute and failing/misfunctioning.

helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f prometheus-values.yaml

Anything else we need to know?

No response

AndrewSav avatar Oct 18 '21 00:10 AndrewSav

i am also encountering this when trying to deploy Prometheus-operator admissionWebhooks disabled.

This is due to Prometheus-operator's deployment referencing the secret (https://github.com/prometheus-community/helm-charts/blob/0a55b7319e0397c2f4a82eb5f680b6a260301e8c/charts/kube-prometheus-stack/templates/prometheus-operator/deployment.yaml#L125) but secret will only be created by admission-create job (https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/admission-webhooks/job-patch/job-createSecret.yaml#L1).

A workaround (or maybe intended behaviour?) will be to set

prometheusOperator:
  tls:
    enabled: false

This will prevent helm from generating the volume and volumeMount blocks (https://github.com/prometheus-community/helm-charts/blob/0a55b7319e0397c2f4a82eb5f680b6a260301e8c/charts/kube-prometheus-stack/templates/prometheus-operator/deployment.yaml#L116).

However, this revealed another set of issues.

  1. Missing role & rolebindings for Prometheus-operator
level=error ts=2021-10-26T14:55:40.266312816Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Probe: failed to list *v1.Probe: probes.monitoring.coreos.com is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-operator\" cannot list resource \"probes\" in API group \"monitoring.coreos.com\" in the namespace \"test-tenant\""
level=error ts=2021-10-26T14:55:42.581554026Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PrometheusRule: failed to list *v1.PrometheusRule: prometheusrules.monitoring.coreos.com is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-operator\" cannot list resource \"prometheusrules\" in API group \"monitoring.coreos.com\" in the namespace \"test-tenant\""

Workaround is to create rolebinding and role with permission matching what's stated here https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/clusterrole.yaml.

  1. Missing role & rolebindings for Prometheus
level=error ts=2021-10-26T14:56:42.254Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-prometheus\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"test-tenant\""
level=error ts=2021-10-26T14:56:54.706Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-prometheus\" cannot list resource \"services\" in API group \"\" in the namespace \"test-tenant\""
level=error ts=2021-10-26T14:56:58.193Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:test-tenant:tenant-foo-prometheus\" cannot list resource \"pods\" in API group \"\" in the namespace \"test-tenant\""

Similar to above, workaround is to create role & rolebinding separately using this as reference https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus/clusterrole.yaml

Question to maintainers:

  1. Should we be setting tls.enabled: false if we are not intending to use admissionWebhooks?
  2. Any issue with creating role/rolebindings when clusterrole/clusterrolebindings are not needed or not applicable? (e.g multi-tenant environment)

I can help create a PR to fix this.

jaanhio avatar Oct 26 '21 15:10 jaanhio

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Nov 26 '21 05:11 stale[bot]

+1

AndrewSav avatar Nov 26 '21 10:11 AndrewSav

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Dec 26 '21 13:12 stale[bot]

+1

AndrewSav avatar Dec 28 '21 19:12 AndrewSav

Works for me.

monotek avatar Dec 28 '21 20:12 monotek

@monotek do you have the prometheus-kube-prometheus-admission secret?

AndrewSav avatar Dec 28 '21 23:12 AndrewSav

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Jan 28 '22 03:01 stale[bot]

Same issue with kube-prometheus-stack 31.0.0 on a fresh cluster. I disabled the admission webhooks, because I do not configure Prometheus in this way and there is no need for it to be running.

prometheusOperator:
  enabled: true
  admissionWebhooks:
    enabled: false

While looking at the code it seems conceptually wrong that the prometheus-operator uses the same TLS certificates intended to be used by the admission webhooks. It should generate it's own certificates if needed or there should be instructions on how to set it up.

  • https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/templates/prometheus-operator/deployment.yaml#L126

Workaround 1: Disable TLS (traffic to operator is now unencrypted?):

  tls:
    enabled: false

Workaround 2: Enable the generation of admission webhooks certificates with cert-manager despite it being disabled (generated by certmanager.yaml#L42):

  admissionWebhooks:
    enabled: false
    certManager:
      enabled: true

Workaround 3: Manually create the needed TLS secrets/certificates.

gw0 avatar Feb 05 '22 16:02 gw0

+1

obvionaoe avatar Feb 18 '22 12:02 obvionaoe

@gw0 thanks for recommending option 2, that seems to work for 33.1.0.

danmanners avatar Mar 02 '22 04:03 danmanners

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Apr 02 '22 07:04 stale[bot]

recent activity

AndrewSav avatar Apr 02 '22 20:04 AndrewSav

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar May 03 '22 02:05 stale[bot]

More recent activity

danmanners avatar May 03 '22 03:05 danmanners

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Jun 04 '22 01:06 stale[bot]

Even more recent activity

AndrewSav avatar Jun 04 '22 02:06 AndrewSav

@monotek can you or anyone take a look?

obvionaoe avatar Jun 13 '22 12:06 obvionaoe

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Jul 13 '22 20:07 stale[bot]

Hello there

AndrewSav avatar Jul 13 '22 21:07 AndrewSav

I'm facing the same issue.

jkleinkauff avatar Sep 01 '22 00:09 jkleinkauff

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Oct 12 '22 05:10 stale[bot]

remove stale

AndrewSav avatar Oct 12 '22 07:10 AndrewSav

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Nov 12 '22 12:11 stale[bot]

/remove-lifecycle stale

AndrewSav avatar Nov 12 '22 20:11 AndrewSav

This issue is being automatically closed due to inactivity.

stale[bot] avatar Nov 27 '22 14:11 stale[bot]

Re-opened as https://github.com/prometheus-community/helm-charts/issues/2742 since it was closed by the bot.

AndrewSav avatar Nov 27 '22 20:11 AndrewSav