kube-prometheus icon indicating copy to clipboard operation
kube-prometheus copied to clipboard

prometheus-adapter: failed querying node metrics

Open jouve opened this issue 3 years ago • 2 comments

What happened? when using network policies, I noticed kubernetes dashboard was not showing metrics anymore. This is caused by traffic being rejected from prometheus-adapter to prometheus-k8s. this is confimed by prometheus-adapter logs were the is the following error:

E0521 13:20:03.515525       1 provider.go:272] failed querying node metrics: unable to fetch node CPU metrics: unable to execute query: Get "http://prometheus-k8s.monitoring.svc:9090/api/v1/query?query=sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B60s%5D%0A++%29%0A++%2A+on%28namespace%2C+pod%29+group_left%28node%29+%28%0A++++node_namespace_pod%3Akube_pod_info%3A%7Bnode%3D%22ns344288%22%7D%0A++%29%0A%29%0Aor+sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++windows_cpu_time_total%7Bmode%3D%22idle%22%2C+job%3D%22windows-exporter%22%2Cnode%3D%22ns344288%22%7D%5B4m%5D%0A++%29%0A%29%0A&time=1653139173.515": dial tcp 10.110.187.176:9090: i/o timeout

Did you expect to see some different? the following ingress peer must be added to the NetworkPolicy of prometheus:

  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus-adapter
    ports:
    - port: 9090
      protocol: TCP

How to reproduce it (as minimally and precisely as possible):

 (import 'kube-prometheus/main.libsonnet') +
      (import 'kube-prometheus/addons/all-namespaces.libsonnet') +

Environment k8s 1.24 deployed with kubeadm

  • Prometheus Operator version:
{
  "alertmanager": "0.24.0",
  "blackboxExporter": "0.20.0",
  "grafana": "8.5.2",
  "kubeStateMetrics": "2.4.2",
  "nodeExporter": "1.3.1",
  "prometheus": "2.35.0",
  "prometheusAdapter": "0.9.1",
  "prometheusOperator": "0.56.2",
  "kubeRbacProxy": "0.12.0",
  "configmapReload": "0.5.0",
  "pyrra": "0.3.4"
}
  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:46:05Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.0", GitCommit:"4ce5a8954017644c5420bae81d72b09b735c21f0", GitTreeState:"clean", BuildDate:"2022-05-03T13:38:19Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

kubeadm

  • Manifests: ?

  • Prometheus Operator Logs: no issue with the operator

  • Prometheus Logs: no issue with prometheus

Anything else we need to know?:

jouve avatar May 21 '22 13:05 jouve

same problem

prometheus-adapter log

E0914 02:18:28.625558       1 provider.go:284] failed querying node metrics: unable to fetch node CPU metrics: unable to execute query: Get "http://prometheus-k8s.monitoring.svc:9090/api/v1/query?query=sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B60s%5D%0A++%29%0A++%2A+on%28namespace%2C+pod%29+group_left%28node%29+%28%0A++++node_namespace_pod%3Akube_pod_info%3A%7Bnode%3D~%22m1%7Cw1%7Cw2%22%7D%0A++%29%0A%29%0Aor+sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++windows_cpu_time_total%7Bmode%3D%22idle%22%2C+job%3D%22windows-exporter%22%2Cnode%3D~%22m1%7Cw1%7Cw2%22%7D%5B4m%5D%0A++%29%0A%29%0A&time=1663121878.624": dial tcp 10.96.221.253:9090: i/o timeout

kubectl top command query failure

$ kubectl get apiservice v1beta1.metrics.k8s.io
NAME                     SERVICE                         AVAILABLE   AGE
v1beta1.metrics.k8s.io   monitoring/prometheus-adapter   True        18h

$ kubectl top node
error: metrics not available yet

$ kubectl top pod -n monitoring 
error: Metrics not available for pod monitoring/alertmanager-main-0, age: 18h9m40.422618556s

aluopy avatar Sep 14 '22 02:09 aluopy

Was having this issue and was able to resolve it by adding a NetworkPolicy to allow prometheus-adapter to talk to prometheus.

We use calico for netpol, not exactly sure the best way to do this with normal netpol (didn't try), but IMO this definitely should be part of the default policy set.

---
apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
  name: ingress-prometheus-adapter-to-prometheus
spec:
  serviceAccountSelector: app.kubernetes.io/name == 'prometheus'
  ingress:
    - action: Allow
      protocol: TCP
      source:
        serviceAccounts:
          selector: app.kubernetes.io/name == 'prometheus-adapter'
        namespaceSelector: projectcalico.org/name == 'metrics-system'
      destination:
        ports:
        - 9090

joshperry avatar Sep 20 '22 06:09 joshperry

I think this is solved by https://github.com/prometheus-operator/kube-prometheus/pull/1870 As a quick fix here is a network policy that is not calico dependent and can be applied as patch:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.36.1
  name: prometheus-k8s-adapter
  namespace: monitoring
spec:
  egress:
  - {}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus-adapter
    ports:
    - port: 9090
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/instance: k8s
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: kube-prometheus
  policyTypes:
  - Egress
  - Ingress
status: {}

everflux avatar Oct 05 '22 18:10 everflux

I think this is solved by #1870 As a quick fix here is a network policy that is not calico dependent and can be applied as patch:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.36.1
  name: prometheus-k8s-adapter
  namespace: monitoring
spec:
  egress:
  - {}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus-adapter
    ports:
    - port: 9090
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/instance: k8s
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: kube-prometheus
  policyTypes:
  - Egress
  - Ingress
status: {}

now kubectl top node command can be output normally, but the kubectl top pod command still fails.

aluopy avatar Oct 08 '22 01:10 aluopy

I fixed the problem by allowing TCP traffic on port 9090 on each kubernetes node:

# iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 9090 -j ACCEPT

now both commands work for me: kubectl top nodes as well as kubectl top pods -A

sys-ops avatar Dec 24 '22 08:12 sys-ops

I made a PR #1982 to fix this.

for workaround, I resolved this by adding these line into my config (only relevant configs):

local kp =
  (import 'kube-prometheus/main.libsonnet') +
  {
    // ... some other configuration here
    prometheus+:: {
      networkPolicy+: {
        spec+: {
          ingress+: [
            {
              // allow prometheus adapter to access prometheus
              from: [{
                podSelector: {
                  matchLabels: {
                    'app.kubernetes.io/name': 'prometheus-adapter',
                  },
                },
              }],
              ports: [{
                port: 'web',
                protocol: 'TCP',
              }],
            },
          ],
        },
      },
    },
  };

Thrimbda avatar Jan 06 '23 06:01 Thrimbda