helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[prometheus-kube-stack] Some API server charts broken in 1.22.8

Open centromere opened this issue 2 years ago • 6 comments

Describe the bug a clear and concise description of what the bug is.

apiserver_request_slo_duration_seconds_count does not seem to be available in Kubernetes 1.22.8. As a result, the latest version of the prometheus-kube-stack chart breaks some panels (and sets off some alarms):

Screen Shot 2022-04-29 at 18 59 05

Screen Shot 2022-04-29 at 19 02 05

https://github.com/kubernetes/kubernetes/blame/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L112-L124

https://github.com/kubernetes/kubernetes/commit/0afa569499d480df4977568454a50790891860f5

Screen Shot 2022-04-29 at 19 04 47

➜  ~ curl -s localhost:8001/metrics | grep slo_
➜  ~

What's your helm version?

v3.8.2

What's your kubectl version?

v1.23.3

Which chart?

prometheus-kube-stack

What's the chart version?

35.0.3

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

N/A

Anything else we need to know?

No response

centromere avatar Apr 29 '22 23:04 centromere

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar May 31 '22 02:05 stale[bot]

Kubernetes 1.23 added apiserver_request_slo_duration_seconds with StabilityLevel: compbasemetrics.ALPHA.

https://github.com/kubernetes-monitoring/kubernetes-mixin uses that metric on the default branch, which is documented as compatible with 1.23+. It does not use it on the release-0.10 branch for v1.20+.

https://github.com/prometheus-operator/kube-prometheus automatically gets the default branch of the mixin and commits it, every Monday morning. Consequently, their default branch is documented as compatible with 1.23. They do have a branch compatible with 1.22 (though not 1.20), which incidentally is called release-0.10.

https://github.com/prometheus-community/helm-charts has a script to get rules from the default branch of kube-prometheus. This is run manually, but I'm not sure under what circumstances. The chart does not really document compatibility with any Kubernetes version.

So I think to fix this the chart's dependency management would have to change.

tewe avatar May 31 '22 07:05 tewe

This one broke the stuff for cluster <=1.23 https://github.com/prometheus-community/helm-charts/commit/d6c45e97eca55e6212ef8acf546b45aa7851c72e#diff-17d39e87761642b2b404590d8819508b1155238fa7f1c35842b4696d6d2554d7

HaveFun83 avatar Jun 01 '22 09:06 HaveFun83

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Jul 07 '22 00:07 stale[bot]

/remove lifecycle/stale

aaroniscode avatar Jul 07 '22 01:07 aaroniscode

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Aug 10 '22 04:08 stale[bot]

This issue is being automatically closed due to inactivity.

stale[bot] avatar Sep 20 '22 18:09 stale[bot]

I believe this issue needs to be re-opened since compatibility isn't managed in a good way at this moment. See @tewe's post earlier.

ZF-fredericvanlinthoudt avatar Jan 09 '23 13:01 ZF-fredericvanlinthoudt

Seems like the names of the metrics have changed for this one in k8s v1.28.x

original apiserver_request_slo*

now apiserver_request_sli*

I have changed every rule with this and now all charts using apiserver_request_sli_duration_seconds_count work like a charm

JoeApo108 avatar Sep 21 '23 14:09 JoeApo108

solution for v.1.28.x version here: https://github.com/prometheus-community/helm-charts/issues/3816

JoeApo108 avatar Sep 21 '23 14:09 JoeApo108