kube-prometheus
kube-prometheus copied to clipboard
prometheus-adapter: failed querying node metrics
What happened? when using network policies, I noticed kubernetes dashboard was not showing metrics anymore. This is caused by traffic being rejected from prometheus-adapter to prometheus-k8s. this is confimed by prometheus-adapter logs were the is the following error:
E0521 13:20:03.515525 1 provider.go:272] failed querying node metrics: unable to fetch node CPU metrics: unable to execute query: Get "http://prometheus-k8s.monitoring.svc:9090/api/v1/query?query=sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B60s%5D%0A++%29%0A++%2A+on%28namespace%2C+pod%29+group_left%28node%29+%28%0A++++node_namespace_pod%3Akube_pod_info%3A%7Bnode%3D%22ns344288%22%7D%0A++%29%0A%29%0Aor+sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++windows_cpu_time_total%7Bmode%3D%22idle%22%2C+job%3D%22windows-exporter%22%2Cnode%3D%22ns344288%22%7D%5B4m%5D%0A++%29%0A%29%0A&time=1653139173.515": dial tcp 10.110.187.176:9090: i/o timeout
Did you expect to see some different? the following ingress peer must be added to the NetworkPolicy of prometheus:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: prometheus-adapter
ports:
- port: 9090
protocol: TCP
How to reproduce it (as minimally and precisely as possible):
(import 'kube-prometheus/main.libsonnet') +
(import 'kube-prometheus/addons/all-namespaces.libsonnet') +
Environment k8s 1.24 deployed with kubeadm
- Prometheus Operator version:
{
"alertmanager": "0.24.0",
"blackboxExporter": "0.20.0",
"grafana": "8.5.2",
"kubeStateMetrics": "2.4.2",
"nodeExporter": "1.3.1",
"prometheus": "2.35.0",
"prometheusAdapter": "0.9.1",
"prometheusOperator": "0.56.2",
"kubeRbacProxy": "0.12.0",
"configmapReload": "0.5.0",
"pyrra": "0.3.4"
}
- Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:38:19Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes cluster kind:
kubeadm
-
Manifests: ?
-
Prometheus Operator Logs: no issue with the operator
-
Prometheus Logs: no issue with prometheus
Anything else we need to know?:
same problem
prometheus-adapter log
E0914 02:18:28.625558 1 provider.go:284] failed querying node metrics: unable to fetch node CPU metrics: unable to execute query: Get "http://prometheus-k8s.monitoring.svc:9090/api/v1/query?query=sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B60s%5D%0A++%29%0A++%2A+on%28namespace%2C+pod%29+group_left%28node%29+%28%0A++++node_namespace_pod%3Akube_pod_info%3A%7Bnode%3D~%22m1%7Cw1%7Cw2%22%7D%0A++%29%0A%29%0Aor+sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++windows_cpu_time_total%7Bmode%3D%22idle%22%2C+job%3D%22windows-exporter%22%2Cnode%3D~%22m1%7Cw1%7Cw2%22%7D%5B4m%5D%0A++%29%0A%29%0A&time=1663121878.624": dial tcp 10.96.221.253:9090: i/o timeout
kubectl top command query failure
$ kubectl get apiservice v1beta1.metrics.k8s.io
NAME SERVICE AVAILABLE AGE
v1beta1.metrics.k8s.io monitoring/prometheus-adapter True 18h
$ kubectl top node
error: metrics not available yet
$ kubectl top pod -n monitoring
error: Metrics not available for pod monitoring/alertmanager-main-0, age: 18h9m40.422618556s
Was having this issue and was able to resolve it by adding a NetworkPolicy to allow prometheus-adapter to talk to prometheus.
We use calico for netpol, not exactly sure the best way to do this with normal netpol (didn't try), but IMO this definitely should be part of the default policy set.
---
apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
name: ingress-prometheus-adapter-to-prometheus
spec:
serviceAccountSelector: app.kubernetes.io/name == 'prometheus'
ingress:
- action: Allow
protocol: TCP
source:
serviceAccounts:
selector: app.kubernetes.io/name == 'prometheus-adapter'
namespaceSelector: projectcalico.org/name == 'metrics-system'
destination:
ports:
- 9090
I think this is solved by https://github.com/prometheus-operator/kube-prometheus/pull/1870 As a quick fix here is a network policy that is not calico dependent and can be applied as patch:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.36.1
name: prometheus-k8s-adapter
namespace: monitoring
spec:
egress:
- {}
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: prometheus-adapter
ports:
- port: 9090
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
policyTypes:
- Egress
- Ingress
status: {}
I think this is solved by #1870 As a quick fix here is a network policy that is not calico dependent and can be applied as patch:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: labels: app.kubernetes.io/component: prometheus app.kubernetes.io/instance: k8s app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus app.kubernetes.io/version: 2.36.1 name: prometheus-k8s-adapter namespace: monitoring spec: egress: - {} ingress: - from: - podSelector: matchLabels: app.kubernetes.io/name: prometheus-adapter ports: - port: 9090 protocol: TCP podSelector: matchLabels: app.kubernetes.io/component: prometheus app.kubernetes.io/instance: k8s app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus policyTypes: - Egress - Ingress status: {}
now kubectl top node command can be output normally, but the kubectl top pod command still fails.
I fixed the problem by allowing TCP traffic on port 9090 on each kubernetes node:
# iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 9090 -j ACCEPT
now both commands work for me: kubectl top nodes as well as kubectl top pods -A
I made a PR #1982 to fix this.
for workaround, I resolved this by adding these line into my config (only relevant configs):
local kp =
(import 'kube-prometheus/main.libsonnet') +
{
// ... some other configuration here
prometheus+:: {
networkPolicy+: {
spec+: {
ingress+: [
{
// allow prometheus adapter to access prometheus
from: [{
podSelector: {
matchLabels: {
'app.kubernetes.io/name': 'prometheus-adapter',
},
},
}],
ports: [{
port: 'web',
protocol: 'TCP',
}],
},
],
},
},
},
};