helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[kube-prometheus-stack] Alert "KubeClientCertificateExpiration" expression output showing wrong values

Open jaisegrg opened this issue 2 years ago • 4 comments

Describe the bug a clear and concise description of what the bug is.

Kube-prometheus-stack helm chart is installed in an AKS cluster, but there is an issue with "KubeClientCertificateExpiration" alert, which shows wrong values for the expression output. Validated the "kube-apiserver" certificate expiration based on the output by converting the output value which is in seconds to days, but its not matching the alerts.

Version: -kube-prometheus-stack-45.8.1 v0.63.0

  **- alert: KubeClientCertificateExpiration**
    annotations:
      description: A client certificate used to authenticate to kubernetes apiserver
        is expiring in less than 7.0 days.
      runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
      summary: Client certificate is about to expire.
    expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
      > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
      < 604800
    for: 5m
    labels:
      severity: warning

  **- alert: KubeClientCertificateExpiration**
    annotations:
      description: A client certificate used to authenticate to kubernetes apiserver
        is expiring in less than 24.0 hours.
      runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
      summary: Client certificate is about to expire.
    expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
      > 0 and on(job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
      < 86400
    for: 5m
    labels:
      severity: critical

image

What's your helm version?

version.BuildInfo{Version:"v3.8.0", GitCommit:"d14138609b01886f544b2025f5000351c9eb092e", GitTreeState:"clean", GoVersion:"go1.17.5"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"fd6aae27a28fca7e8b996d7201b0da6fbf6f732a", GitTreeState:"clean", BuildDate:"2023-04-08T13:27:20Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

prometheus-community/kube-prometheus-stack

What's the chart version?

kube-prometheus-stack-45.8.1 v0.63.0

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install prometheus-central
--namespace monitoring
prometheus-community/kube-prometheus-stack

Anything else we need to know?

No response

jaisegrg avatar May 29 '23 05:05 jaisegrg

Faced with same issue when I restart promtheus, then I upgrade promtheus to latest 2.24.0 and restart I again, the problem disappeared.

ykfq avatar Jun 14 '23 03:06 ykfq

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Aug 07 '23 05:08 stale[bot]

Hi there, I encountered the same issue, and I don't think the expr query work as expected. I think the apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and on(job) part is not correct. As the result, the value of the whole query is not decreasing but monotonically increasing. I know it should be fixed in https://github.com/kubernetes-monitoring/kubernetes-mixin

koooge avatar Jul 05 '24 13:07 koooge

As the workaround this worked to me:

histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
and on (job) apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0

refs https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/941

koooge avatar Jul 05 '24 13:07 koooge

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 00:04 stale[bot]

This issue is being automatically closed due to inactivity.

stale[bot] avatar Jul 18 '25 22:07 stale[bot]