helm-charts
helm-charts copied to clipboard
[prometheus-kube-stack] Some API server charts broken in 1.22.8
Describe the bug a clear and concise description of what the bug is.
apiserver_request_slo_duration_seconds_count
does not seem to be available in Kubernetes 1.22.8. As a result, the latest version of the prometheus-kube-stack
chart breaks some panels (and sets off some alarms):
https://github.com/kubernetes/kubernetes/blame/master/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L112-L124
https://github.com/kubernetes/kubernetes/commit/0afa569499d480df4977568454a50790891860f5
➜ ~ curl -s localhost:8001/metrics | grep slo_
➜ ~
What's your helm version?
v3.8.2
What's your kubectl version?
v1.23.3
Which chart?
prometheus-kube-stack
What's the chart version?
35.0.3
What happened?
No response
What you expected to happen?
No response
How to reproduce it?
No response
Enter the changed values of values.yaml?
No response
Enter the command that you execute and failing/misfunctioning.
N/A
Anything else we need to know?
No response
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Kubernetes 1.23 added apiserver_request_slo_duration_seconds
with StabilityLevel: compbasemetrics.ALPHA
.
https://github.com/kubernetes-monitoring/kubernetes-mixin uses that metric on the default branch, which is documented as compatible with 1.23+. It does not use it on the release-0.10 branch for v1.20+.
https://github.com/prometheus-operator/kube-prometheus automatically gets the default branch of the mixin and commits it, every Monday morning. Consequently, their default branch is documented as compatible with 1.23. They do have a branch compatible with 1.22 (though not 1.20), which incidentally is called release-0.10.
https://github.com/prometheus-community/helm-charts has a script to get rules from the default branch of kube-prometheus. This is run manually, but I'm not sure under what circumstances. The chart does not really document compatibility with any Kubernetes version.
So I think to fix this the chart's dependency management would have to change.
This one broke the stuff for cluster <=1.23 https://github.com/prometheus-community/helm-charts/commit/d6c45e97eca55e6212ef8acf546b45aa7851c72e#diff-17d39e87761642b2b404590d8819508b1155238fa7f1c35842b4696d6d2554d7
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
/remove lifecycle/stale
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
This issue is being automatically closed due to inactivity.
I believe this issue needs to be re-opened since compatibility isn't managed in a good way at this moment. See @tewe's post earlier.
Seems like the names of the metrics have changed for this one in k8s v1.28.x
original apiserver_request_slo*
now apiserver_request_sli*
I have changed every rule with this and now all charts using apiserver_request_sli_duration_seconds_count
work like a charm
solution for v.1.28.x version here: https://github.com/prometheus-community/helm-charts/issues/3816