aws-workshop-for-kubernetes icon indicating copy to clipboard operation
aws-workshop-for-kubernetes copied to clipboard

201 - Prometheus not loading any metrics.

Open andrewhertog opened this issue 6 years ago • 4 comments

I'm currently following https://github.com/aws-samples/aws-workshop-for-kubernetes/tree/master/02-path-working-with-clusters/201-cluster-monitoring

I've successfully loaded Prometheus in a browser after using the proxy command kubectl port-forward $(kubectl get po -l prometheus=prometheus -n monitoring -o jsonpath={.items[0].metadata.name}) 9090 -n monitoring but i am not seeing any of the metrics on localhost:9090

This is all I see: screen shot 2018-06-11 at 12 04 07 pm

I have gone through 201 from the beginning twice, with the same results, following the cleanup shown at the end of the tutorial

Update

I just did some digging and noticed a lot of the following in the logs for the prometheus-prometheus-0 pod:

level=error ts=2018-06-11T18:11:56.227803158Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.231432703Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.231436883Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.231516245Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.243360069Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.243441332Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.243462191Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.24351401Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.24351825Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.243564991Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"default\""
level=error ts=2018-06-11T18:11:56.243573509Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.243612852Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.243625904Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.254559683Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.254651698Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"default\""
level=error ts=2018-06-11T18:11:56.254773778Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.254851617Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:177: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list endpoints in the namespace \"monitoring\""
level=error ts=2018-06-11T18:11:56.254910369Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:178: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-operator\" cannot list services in the namespace \"kube-system\""
level=error ts=2018-06-11T18:11:56.449285051Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:179: Failed to watch *v1.Pod: unknown (get pods)"
level=error ts=2018-06-11T18:11:56.452019179Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:179: Failed to watch *v1.Pod: unknown (get pods)"
level=error ts=2018-06-11T18:11:56.452942474Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:179: Failed to watch *v1.Pod: unknown (get pods)"
level=error ts=2018-06-11T18:11:56.453427575Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:179: Failed to watch *v1.Pod: unknown (get pods)"
level=error ts=2018-06-11T18:11:56.49484826Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:179: Failed to watch *v1.Pod: unknown (get pods)"
level=error ts=2018-06-11T18:11:56.496102528Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:179: Failed to watch *v1.Pod: unknown (get pods)"
level=error ts=2018-06-11T18:11:56.498910773Z caller=main.go:212 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:179: Failed to watch *v1.Pod: unknown (get pods)"

andrewhertog avatar Jun 11 '18 16:06 andrewhertog

[UPDATE] I think the issue is just that prometheus-operator and prometheus are sharing the same RBAC here. While they should be different. The other Cluster Role that should be used is: https://github.com/coreos/prometheus-operator/blob/v0.14.1/Documentation/rbac.md#prometheus-rbac and should be used as a different SA here https://github.com/aws-samples/aws-workshop-for-kubernetes/blob/master/02-path-working-with-clusters/201-cluster-monitoring/templates/prometheus/prometheus.yaml#L228

I realized that as the API Servers appeared to be down (which happened because the get of /metrics is not listed in the prometheus-operator cluster-role). So the underlying problem is that there is a missing cluster role.


I also hit this issue - You can just kubectl edit clusterrole prometheus-operator -n monitoring and add the missing verbs, in your case list for endpoints/svcs and watch for pods I think. This is the RBAC that got the UI up in my case.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
[...]
rules:
- apiGroups:
  - extensions
  resources:
  - thirdpartyresources
  verbs:
  - '*'
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - '*'
- apiGroups:
  - monitoring.coreos.com
  resources:
  - alertmanagers
  - prometheuses
  - servicemonitors
  verbs:
  - '*'
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - watch
  - list
  - delete
- apiGroups:
  - ""
  resources:
  - services
  - endpoints
  verbs:
  - get
  - list
  - create
  - watch
  - update
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - list

Interestingly, these are not required according to the doc of 0.14.1

CharlyF avatar Jul 06 '18 19:07 CharlyF

Can someone make the changes that @CharlyF mentioned? As of 7/14/18 this was still not working.

jicowan avatar Jul 17 '18 01:07 jicowan

I had the same issue and had to modify the cluster role.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-operator
  namespace: monitoring
rules:
- apiGroups:
  - extensions
  resources:
  - thirdpartyresources
  verbs:
  - "*"
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - "*"
- apiGroups:
  - monitoring.coreos.com
  resources:
  - alertmanagers
  - prometheuses
  - servicemonitors
  - prometheusrules
  verbs:
  - "*"
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs: ["*"]
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  verbs: ["*"]
- apiGroups: [""]
  resources:
  - pods
  verbs: ["list", "delete", "watch"]
- apiGroups: [""]
  resources:
  - services
  - endpoints
  verbs: ["get", "create", "update", "watch", "list"]
- apiGroups: [""]
  resources:
  - nodes
  verbs: ["list", "watch"]
- nonResourceURLs:
  - /metrics
  verbs: ["get"]
- apiGroups: [""]
  resources:
  - namespaces
  verbs: ["list"]

dannyvargas23 avatar Sep 06 '18 21:09 dannyvargas23

And to be a little more precise—if you followed the directions in the guide, and you have a blank Targets page (and the prometheus container in the prometheus-prometheus-1 pod is showing errors in the log like the ones shown earlier in this thread), then you need to:

  1. Copy and paste the entire code block in @dannyvargas23's comment above; paste it into the prometheus-bundle.yaml file directly.
  2. Run the command kubectl apply -f templates/prometheus/prometheus-bundle.yaml again, to apply the changes.

After a couple minutes, you should start seeing Targets 'UP' in the Prometheus UI.

I'll file a PR with this change, hopefully it can get merged soon!

geerlingguy avatar Sep 12 '18 15:09 geerlingguy