descheduler
descheduler copied to clipboard
Prometheus: TLS certificate does not contain IP SANs
What version of descheduler are you using?
descheduler version: v0.26.0
Does this issue reproduce with the latest release? yes!
Which descheduler CLI options are you using?
- args:
- --policy-config-file
- /policy-dir/policy.yaml
- --descheduling-interval
- 5m
- --v
- "3"
- --leader-elect=true
- --leader-elect-lease-duration=15s
- --leader-elect-renew-deadline=10s
- --leader-elect-retry-period=2s
- --leader-elect-resource-lock=leases
- --leader-elect-resource-name=descheduler
- --leader-elect-resource-namespace=kube-system
command:
- /bin/descheduler
Please provide a copy of your descheduler policy config file
N/A
What k8s version are you using (kubectl version)? v1.24.12
What did you do?
Let Prometheus scrape metrics from the descheduler.
What did you expect to see?
A TLS certificate that allows Prometheus to scrape metrics would be nice.
What did you see instead?
The descheduler creates a TLS certificate that does not include an IP address, causing Prometheus to refuse to scrape even when instructed to skip TLS verification:
Get "https://172.30.95.137:10258/metrics": x509: cannot validate certificate for 172.30.95.137 because it doesn't contain any IP SANs
Looks like adding the Pod IP address would be sufficient. However, I am aware that IP address auto-detection might be tricky if the Pod has multiple interfaces attached (e.g., with Multus). Solutions could be
- Resolve own Pod IP address through kube-dns
- Use the Kubernetes Downward API and a command line argument to pass the IP address
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Still an open issue …
/remove-lifecycle stale
Just ran into this as well. Something that I was just wondering is why is the descheduler serving over HTTPS in the first place? Looks like there's some justification that the kuberentes project as a whole (kube api server, etc), has deprecated insecure connections, but as best I can tell, there's just the metrics endpoint that is being served, which is unlikely to be exposed publicly. Even if it is, all connections to it would fail as the provided TLS cert won't be valid...?
Is there a way to actually have prometheus scrape these metrics? How are people collecting these metrics?
It looks like based on https://github.com/kubernetes-sigs/descheduler/issues/842 and https://github.com/kubernetes-sigs/descheduler/issues/1095 , this issue has been raised, but there's still not a meaningful way to support Prometheus scraping. I don't necessarily disagree with HTTPS being required, but if there's some way to have Prometheus scrape the metrics, could that at least be documented?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.