fix: allow opting-into upstream probes
Description
Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request. If it fixes a bug or resolves a feature request, be sure to link to that issue.
Allow users to opt-into upstream probe definitions.
Type of change
What type of changes does your code introduce to the kube-prometheus? Put an x in the box that apply.
- [ ]
CHANGE(fix or feature that would cause existing functionality to not work as expected) - [ ]
FEATURE(non-breaking change which adds functionality) - [x]
BUGFIX(non-breaking change which fixes an issue) - [ ]
ENHANCEMENT(non-breaking change which improves existing functionality) - [ ]
NONE(if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)
Changelog entry
Please put a one-line changelog entry below. Later this will be copied to the changelog file.
Users can opt-into the upstreamProbesOptIn (boolean, false by default) for KSM to sync with upstream's health probe definitions.
Tested downstream (CMO), on enabling upstreamProbesOptIn.
diff --git a/assets/kube-state-metrics/deployment.yaml b/assets/kube-state-metrics/deployment.yaml
index ec651432e..e73091269 100644
--- a/assets/kube-state-metrics/deployment.yaml
+++ b/assets/kube-state-metrics/deployment.yaml
@@ -57,7 +57,19 @@ spec:
^kube_pod_completion_time$,
^kube_pod_status_scheduled$
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
+ livenessProbe:
+ httpGet:
+ path: /livez
+ port: 8080
+ initialDelaySeconds: 5
+ timeoutSeconds: 5
name: kube-state-metrics
+ readinessProbe:
+ httpGet:
+ path: /livez
+ port: 8080
+ initialDelaySeconds: 5
+ timeoutSeconds: 5
resources:
requests:
cpu: 2m
diff --git a/jsonnet/main.jsonnet b/jsonnet/main.jsonnet
index 7539fb456..ffabfb4f0 100644
--- a/jsonnet/main.jsonnet
+++ b/jsonnet/main.jsonnet
@@ -216,6 +216,7 @@ local inCluster =
kubeRbacProxyImage: $.values.common.images.kubeRbacProxy,
commonLabels+: $.values.common.commonLabels,
mixin+: { ruleLabels: $.values.common.ruleLabels },
+ upstreamProbesOptIn: true,
},
nodeExporter: {
namespace: $.values.common.namespace,
@@ -439,7 +440,7 @@ local inCluster =
openshiftStateMetrics: openshiftStateMetrics($.values.openshiftStateMetrics),
} +
(import './utils/anti-affinity.libsonnet') +
- (import 'github.com/prometheus-operator/kube-prometheus/jsonnet/kube-prometheus/addons/ksm-lite.libsonnet') +
+ (import 'github.com/rexagod/kube-prometheus/jsonnet/kube-prometheus/addons/ksm-lite.libsonnet') +
(import './utils/ibm-cloud-managed-profile.libsonnet') +
(import './components/metrics-server-audit.libsonnet') +
{}; // Including empty object to simplify adding and removing imports during development
- [x] Add a comment describing the way probes are defined and expected to work.
@simonpasquier Pushed all the suggested changes but s/defaults.ignorePaths/[] and s/super.ignorePaths/[] since I believe these will always be [], in addition to the current implementation allowing for a single source of truth for the default value?
I generated the manifest with enableProbes: true but Kubernetes isn't happy with it:
$ kc events
LAST SEEN TYPE REASON OBJECT MESSAGE
6m20s (x2355 over 21h) Warning Unhealthy Pod/kube-state-metrics-68fccdc5f7-9wzdn Liveness probe errored: strconv.Atoi: parsing "http-metrics": invalid syntax
80s (x2703 over 21h) Warning Unhealthy Pod/kube-state-metrics-68fccdc5f7-9wzdn Readiness probe errored: strconv.Atoi: parsing "telemetry": invalid syntax
Tested on a cluster, seems to work now. I've made KRP's ports the single source of truth for KSM's probes and ports for convenience. PTAL below for downstream diff.
diff --git a/assets/kube-state-metrics/deployment.yaml b/assets/kube-state-metrics/deployment.yaml
index ec651432e..3b444f081 100644
--- a/assets/kube-state-metrics/deployment.yaml
+++ b/assets/kube-state-metrics/deployment.yaml
@@ -57,7 +57,22 @@ spec:
^kube_pod_completion_time$,
^kube_pod_status_scheduled$
image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
+ livenessProbe:
+ httpGet:
+ path: /livez
+ port: http-metrics
+ scheme: HTTPS
name: kube-state-metrics
+ ports:
+ - containerPort: 8443
+ name: http-metrics
+ - containerPort: 9443
+ name: telemetry
+ readinessProbe:
+ httpGet:
+ path: /readyz
+ port: telemetry
+ scheme: HTTPS
resources:
requests:
cpu: 2m
@@ -75,6 +90,7 @@ spec:
- --secure-listen-address=:8443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- --upstream=http://127.0.0.1:8081/
+ - --ignore-paths=/livez
- --tls-cert-file=/etc/tls/private/tls.crt
- --tls-private-key-file=/etc/tls/private/tls.key
- --client-ca-file=/etc/tls/client/client-ca.crt
@@ -83,7 +99,7 @@ spec:
name: kube-rbac-proxy-main
ports:
- containerPort: 8443
- name: https-main
+ name: http-metrics
resources:
requests:
cpu: 1m
@@ -104,6 +120,7 @@ spec:
- --secure-listen-address=:9443
- --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
- --upstream=http://127.0.0.1:8082/
+ - --ignore-paths=/readyz
- --tls-cert-file=/etc/tls/private/tls.crt
- --tls-private-key-file=/etc/tls/private/tls.key
- --client-ca-file=/etc/tls/client/client-ca.crt
@@ -112,7 +129,7 @@ spec:
name: kube-rbac-proxy-self
ports:
- containerPort: 9443
- name: https-self
+ name: telemetry
resources:
requests:
cpu: 1m
Needs an approval (for workflows).
https://github.com/prometheus-operator/kube-prometheus/actions/runs/11370566998/job/31653537858?pr=2505 fails on kube-state-metrics check for metrics.
Fixed, needs workflow approval.
Re-ping, @simonpasquier for another look here. 🙂
Since port-definitions use the port numbers as keys, I believe there's no workaround to define the same ones on KSM, without server-side apply warnings. I'll open a ticket/PR upstream to see if we cannot address this there.
Actually, since Kubelet looks at KSM's ports and pings them on the proxy (which then goes through successfully if the port definitions are duplicated) may suggest that we'd need to bring this up to the upstream k/k community to address this particular use-case.