kubernetes-app icon indicating copy to clipboard operation
kubernetes-app copied to clipboard

incomplete install instructions?

Open sensay-nelson opened this issue 6 years ago • 35 comments

It appears that in addition to the Node Exporter and Kube State Metrics, a 3rd component (prometheus scraper) must be manually added by the user in order for this to function.

A user must manually do the following for this to work:

  • install the configmap (provided in the grafana kubernetes app configuration interface)
  • run a prometheus pod which uses said configmap to scrape metrics.

Without these steps, almost no metrics will work. These requirements are missing from the readme.

The Deploy button will deploy the following:
(1) A promtheus configmap which contains the prometheus jobs that collect metrics used by the dashboards in the kubernetes app
  - Incorrect, the grafana kuberentes app does not do this ^^
(2) a Node Exporter deployment, and 
(3) a Kube-State Metrics deployment

Unless I am missing something else possibly?

sensay-nelson avatar Jan 28 '19 04:01 sensay-nelson

Looking closer, I'm seeing a lot of these errors in the kube-state-metrics pods So perhaps my issue is permissions related.

Do we know what permissions this container requires? I do not see any serviceaccountName in the deployment configuration. Once we know permissions, how are they to be assigned?

E0128 04:33:50.024112       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/replicaset.go:87: Failed to list *v1beta1.ReplicaSet: replicasets.extensions is forbidden: User "system:serviceaccount:kube-system:default" cannot list replicasets.extensions at the cluster scope
E0128 04:33:50.114785       1 reflector.go:205] k8s.io/kube-state-metrics/collectors/resourcequota.go:67: Failed to list *v1.ResourceQuota: resourcequotas is forbidden: User "system:serviceaccount:kube-system:default" cannot list resourcequotas at the cluster scope

sensay-nelson avatar Jan 28 '19 04:01 sensay-nelson

I fixed the permission issues by adding a serviceaccount to the kubernetes configuration for kube-state-metrics with the following configs.
Unfortunately, it did not improve any information available in the dashboard. error messaging in kube-state-metrics container reduced significantly though.

kubectl --namespace=kube-system create -f kube-state-metrics-role.yaml

kube-state-metrics-role.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - extensions
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: kube-state-metrics
roleRef:
  kind: ClusterRole
  name: kube-state-metrics
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    grafanak8sapp: "true"
    k8s-app: kube-state-metrics
  name: kube-state-metrics
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 2
  selector:
    matchLabels:
      grafanak8sapp: "true"
      k8s-app: kube-state-metrics
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        grafanak8sapp: "true"
        k8s-app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - image: quay.io/coreos/kube-state-metrics:v1.1.0
        imagePullPolicy: IfNotPresent
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

sensay-nelson avatar Jan 28 '19 06:01 sensay-nelson

i added a service for the node_exporter to expose 9100 and set the Prometheus datasource in grafana to it. I don't think this is what I'm supposed to do, and naturally it still doesn't work.

sensay-nelson avatar Jan 28 '19 08:01 sensay-nelson

I also created a service for kube-state-metrics, still not the metrics this is looking for. For instance, if I am looking at the "K8s Container" dashboard, the "Total Memory Usage" panel. it's trying to calculate the following:

sum(container_memory_usage_bytes{pod_name=~"$pod"}) by (pod_name)

If I examine with query inspector, it's trying to do the following. Which I assume it is trying to hit the kubernetes api? although api/v1/query_range is not a k8 route I'm familiar with. Right now my Prometheus data source is set to the kube-state-metrics, and obviously this query fails. I'm not sure how to update it to check the kubernetes datasource again. the way this app works is very confusing.

xhrStatus:"complete"
request:Object
method:"GET"
url:"api/datasources/proxy/17/api/v1/query_range?query=sum(container_memory_usage_bytes%7Bpod_name%3D~%22contact-bot-7779886947-pwfqd%7Cdfuse-events-77f74d44bc-n9lmm%7Ceos-monitor-568bf8688-sm96w%7Ckeosd-7f4d745d-lp45w%7Clogspout-papertrail-7ql54%7Clogspout-papertrail-wrsm2%7Cmake-sense-app-fdffd4495-nwzr4%7Cmetabase-9ff4bdf5c-vc75k%7Cnats-7958747d76-ccvrb%7Cnginx-7bc66d857-dz5hs%7Cnginx-7bc66d857-lkjql%7Cprometheus-7d584c557-4xc2j%7Csendy-59b94fb496-8rxxl%7Csense-registration-5bb8f77b5c-k7xkl%7Csense-registration-5bb8f77b5c-pnbqz%7Csensetoken-6554c74bb7-75qrl%7Csensetoken-6554c74bb7-zxkcw%7Cdns-controller-6f9fb9cf78-849nb%7Cetcd-server-events-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Cetcd-server-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-apiserver-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-controller-manager-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-dns-7c4d8456dd-hwq49%7Ckube-dns-7c4d8456dd-vnws9%7Ckube-dns-autoscaler-f4c47db64-wm4cs%7Ckube-proxy-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-proxy-ip-172-22-20-181%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-proxy-ip-172-22-30-172%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-scheduler-ip-172-22-10-5%5C%5C.us-west-2%5C%5C.compute%5C%5C.internal%7Ckube-state-metrics-699cf64f48-8t46r%7Ckubernetes-dashboard-7798c48646-kfg4h%7Cnode-exporter-cz6hj%7Cnode-exporter-wrbzs%7Cweave-net-l2bqf%7Cweave-net-pzfwc%7Cweave-net-twgsk%22%7D)%20by%20(pod_name)&start=1548678030&end=1548679845&step=15"

sensay-nelson avatar Jan 28 '19 12:01 sensay-nelson

w00t. some progress. As I suspected at the start, in addition to the the kube-state-metrics and node-exporter, you need to manually create the config for a prometheus pod with the provided configmap, expose this pod via a service and then use that as the prometheus data source in the k8s app config for this cluster. I am now getting metrics for these dashboards:

  • K8s Cluster dashboard
  • K8s Deployments
  • K8s Nodes "K8s Container" dashboard shows the containers, but none of the metrics are working sadly.

I would love get the per-pod metrics working as these are probably the most useful stats for establishing resource constraints - one of the more challenging tasks in managing a k8s cluster. If I can get this last part figured out, I'll wrap up all these findings in a pull request (improving the readme if nothing else)

sensay-nelson avatar Jan 28 '19 13:01 sensay-nelson

So far, I have narrowed it down to the prometheus configs for getting cadvisor is not populating, naming is off, or possibly a permissions issue. This is the query from grafan kubernetes app (k8s container dashboard memory usage)

sum(container_memory_usage_bytes{pod_name=~"my-container-7779886947-pwfqd|other-container-77f74d44bc-n9lmm"}) by (pod_name)

However, when I go directly to the /metrics in prometheus, container_memory_usage_bytes is not an optional metric. The prometheus config looks correct though. If I hit the metrics/cadvisor routes from a kube proxy to the api, the prometheus output is there:

http://127.0.0.1:8001/api/v1/nodes/my-hostname/proxy/metrics/cadvisor

...
container_memory_usage_bytes{container_name="",id="/",image="",name="",namespace="",pod_name=""} 5.430976512e+09
...

Prometheus configuration.

    scrape_configs:
    - job_name: 'kubernetes-kubelet'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics
    - job_name: 'kubernetes-cadvisor'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-kube-state'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
        regex: .*true.*
        action: keep
      - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
        regex: 'node-exporter;(.*)'
        action: replace
        target_label: nodename

Queries for kube-state appear to be fine.
Perhaps it's a permission issue? My prometheus container is using the same role I posted above.

sensay-nelson avatar Jan 28 '19 20:01 sensay-nelson

yep, permission issue. started prometheus with log-level debug: command: ["prometheus","--config.file=/etc/prometheus/prometheus.yml","--log.level=debug"] and there are the beautiful 403's.

level=debug ts=2019-01-29T05:34:37.90956911Z caller=scrape.go:825 component="scrape manager" scrape_pool=kubernetes-kubelet target=https://kubernetes.default.svc:443/api/v1/nodes/ip-172-22-20-181.us-west-2.compute.internal/proxy/metrics msg="Scrape failed" err="server returned HTTP status 403 Forbidden"
level=debug ts=2019-01-29T05:34:41.453264939Z caller=scrape.go:825 component="scrape manager" scrape_pool=kubernetes-cadvisor target=https://kubernetes.default.svc:443/api/v1/nodes/ip-172-22-10-5.us-west-2.compute.internal/proxy/metrics/cadvisor msg="Scrape failed" err="server returned HTTP status 403 Forbidden"

sensay-nelson avatar Jan 29 '19 05:01 sensay-nelson

Oh, this is one of those fun problems that make you question all of your life decisions.

This can't be fixed with simple rbac rules, it requires flags set on the kubelet which nicely reduce security. Problem and solutions nicely summarized https://github.com/coreos/prometheus-operator/issues/633

While it's easy to accomplish manually on currently running nodes, in order to be maintained you will need to dig into whatever tool you use to create/manage clusters. I'm using kops. This is how to get it done on running nodes:

sudo vi /etc/sysconfig/kubelet

add 
--authentication-token-webhook=true --authorization-mode=Webhook
to the DAEMON_ARGS

sudo systemctl restart kubelet

edit: nope, not quite. adding those flags may have worked, but it blocked cert authentication to view logs using kubectl with cert authentication. Reading carefully at https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus#prerequisites it appears that when --authorization-mode=Webhook, than cert authorization will not work - it's one or the other. existing solutions assume that a cluster is setup with pure rbac and authorization, which is not the case for my kops cluster unfortunately.

i see some solutions around using an http rather than https request, https://github.com/kubernetes/kops/issues/5176#issuecomment-391603121 but i'm unsure how to manually alter the prometheus configs. the prometheus-operator, like all things helm, is difficult to stitch together what the final templates look like.

sensay-nelson avatar Jan 29 '19 12:01 sensay-nelson

after 3 days of troubleshooting, i unfortunately must concede defeat. if anyone get's this to work on a k8 cluster 1.8+, please do chime in.

edit: I CONCEDE NOTHING! Finally got the K8 Container Dashboard to work and now have memory stats by container...yay!

I came across this post again (which ironically, one of the first things i read in troubleshooting) https://github.com/prometheus/prometheus/pull/2918 It made a little more sense to me now. Mucking around with the routes in kubectl proxy I was able to find a config that works.

Everything will vary a little bit based on your cluster setup. The material component appears to be my usage of kops vs kubeadm for cluster setup.

  • Kops does not currently support Webhook authorization flag on kubelet - this is required to access the metrics/cadvisor route via the api proxy to the kubelet on the standard port and basically the underlying issue.

  • Kops also does not disable the insecure no-auth required port 4194/proxy/metrics route by default, which provides us with a viable solution. node:4194/proxy/metrics (in at least k8 1.8) includes the stats that are normally accessed via metrics/cadvisor.

kubeadm does the opposite of both of those, for better or worse, which is why getting a straight answer has been challenging.

This is my final configmap for the prometheus scraper. Only one line is modified on kubernetes-cadvisor replacement: /api/v1/nodes/${1}:4194/proxy/metrics

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus
  namespace: kube-system
data:
  prometheus.yml: |
    - job_name: 'kubernetes-kubelet'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics

    - job_name: 'kubernetes-cadvisor'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}:4194/proxy/metrics

    - job_name: 'kubernetes-kube-state'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name
      - source_labels: [__meta_kubernetes_pod_label_grafanak8sapp]
        regex: .*true.*
        action: keep
      - source_labels: ['__meta_kubernetes_pod_label_daemon', '__meta_kubernetes_pod_node_name']
        regex: 'node-exporter;(.*)'
        action: replace
        target_label: nodename

The ServiceAccount ClusterRole for prometheus can be grabbed from: https://github.com/prometheus/prometheus/blob/master/documentation/examples/rbac-setup.yml

If your curious, the prometheus-kubernetes.yml example in that same directory has some conflicting information in regards to the scrape configs depending on the version of k8s deployed. There is a versioning issue, but also the issue with how your cluster and kubelet authorization and authentication flags are set, which is not addressed. I'll aim for a pull request which will hopefully add clarity to the situation.

sensay-nelson avatar Jan 29 '19 16:01 sensay-nelson

I keep getting "Query support not implemented yet" in the dashboard. But I can see all the pod metrics individually. Any ideas?

illectronic avatar Jan 30 '19 20:01 illectronic

@illectronic that error comes from this repo https://github.com/grafana/kubernetes-app/blob/ddf616e74c2146e72529316f4fb0348b787e38f4/dist/datasource/datasource.ts#L134

Looks like it's related to the kubernetes api source, but i'm not sure what triggers it exactly.

sensay-nelson avatar Jan 31 '19 17:01 sensay-nelson

somewhere in my toiling I lost the node data :( . not having an arch diagram that describes the source aggregation is driving me nuts.

sensay-nelson avatar Jan 31 '19 18:01 sensay-nelson

It appears that in addition to the Node Exporter and Kube State Metrics, a 3rd component (prometheus scraper) must be manually added by the user in order for this to function.

A user must manually do the following for this to work:

  • install the configmap (provided in the grafana kubernetes app configuration interface)
  • run a prometheus pod which uses said configmap to scrape metrics.

Without these steps, almost no metrics will work. These requirements are missing from the readme.

The Deploy button will deploy the following:
(1) A promtheus configmap which contains the prometheus jobs that collect metrics used by the dashboards in the kubernetes app
  - Incorrect, the grafana kuberentes app does not do this ^^
(2) a Node Exporter deployment, and 
(3) a Kube-State Metrics deployment

Unless I am missing something else possibly?

hi, I am stuck; not sure what i have to do with respect to prometheus. could you pls help me with that;

i have running k8s cluster and grafana configuration is set up; what i have to do with respect to prometheus; pls help me pointers right from installation?

sakthishanmugam02 avatar Feb 01 '19 22:02 sakthishanmugam02

It appears that in addition to the Node Exporter and Kube State Metrics, a 3rd component (prometheus scraper) must be manually added by the user in order for this to function.

A user must manually do the following for this to work:

  • install the configmap (provided in the grafana kubernetes app configuration interface)
  • run a prometheus pod which uses said configmap to scrape metrics.

Without these steps, almost no metrics will work. These requirements are missing from the readme.

The Deploy button will deploy the following:
(1) A promtheus configmap which contains the prometheus jobs that collect metrics used by the dashboards in the kubernetes app
  - Incorrect, the grafana kuberentes app does not do this ^^
(2) a Node Exporter deployment, and 
(3) a Kube-State Metrics deployment

Unless I am missing something else possibly?

how to install configmap? which one? could you please elaborate? also how to download prometheus pod and run?

sakthishanmugam02 avatar Feb 02 '19 19:02 sakthishanmugam02

@sakthishanmugam02 im working on a pull request that should hopefully help you. give me a few hours.

sensay-nelson avatar Feb 03 '19 00:02 sensay-nelson

@sakthishanmugam02 here ya go: https://github.com/sensay-nelson/kubernetes-app/pull/1

sensay-nelson avatar Feb 03 '19 05:02 sensay-nelson

@sensay-nelson when i try to deploy the configuration, kubectl get deploy -n kube-system NAME READY UP-TO-DATE AVAILABLE AGE coredns 2/2 2 2 140m kube-state-metrics 0/1 0 0 38m metrics-server 1/1 1 1 57m prometheus 0/1 0 0 34m

sakthishanmugam02 avatar Feb 03 '19 13:02 sakthishanmugam02

@sakthishanmugam02 did you create the service account first? what does kubectl -n kube-system describe pod <pod> tell you the issue is?

sensay-nelson avatar Feb 03 '19 13:02 sensay-nelson

@sensay-nelson i dont see any pod running for promotheus and kube-state-metrics kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-86c58d9df4-6gkg4 1/1 Running 0 152m coredns-86c58d9df4-xngh7 1/1 Running 0 152m etcd-hwim-perf-test 1/1 Running 0 151m kube-apiserver-hwim-perf-test 1/1 Running 0 58m kube-controller-manager-hwim-perf-test 1/1 Running 2 151m kube-flannel-ds-amd64-ndhkk 1/1 Running 0 131m kube-proxy-hw2fs 1/1 Running 0 152m kube-scheduler-hwim-perf-test 1/1 Running 2 151m metrics-server-68d85f76bb-db22h 1/1 Running 0 60m node-exporter-mvc2c 1/1 Running 0 48m

sakthishanmugam02 avatar Feb 03 '19 13:02 sakthishanmugam02

@sensay-nelson output of describe deploy:

root@hwim-perf-test:~/prom-config# kubectl describe deploy kube-state-metrics -n kube-system Name: kube-state-metrics Namespace: kube-system CreationTimestamp: Sun, 03 Feb 2019 13:56:01 +0000 Labels: grafanak8sapp=true k8s-app=kube-state-metrics Annotations: deployment.kubernetes.io/revision: 1 Selector: grafanak8sapp=true,k8s-app=kube-state-metrics Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: grafanak8sapp=true k8s-app=kube-state-metrics Service Account: prometheus Containers: kube-state-metrics: Image: quay.io/coreos/kube-state-metrics:v1.1.0 Port: 8080/TCP Host Port: 0/TCP Readiness: http-get http://:8080/healthz delay=5s timeout=5s period=10s #success=1 #failure=3 Environment: Mounts: Volumes: Conditions: Type Status Reason


Progressing True NewReplicaSetCreated Available False MinimumReplicasUnavailable ReplicaFailure True FailedCreate OldReplicaSets: NewReplicaSet: kube-state-metrics-6bdd878bd7 (0/1 replicas created) Events: Type Reason Age From Message


Normal ScalingReplicaSet 44s deployment-controller Scaled up replica set kube-state-metrics-6bdd878bd7 to 1

sakthishanmugam02 avatar Feb 03 '19 13:02 sakthishanmugam02

try checking the issue with the replicaset i guess. isnt kube-state-metrics-6bdd878bd7 a pod id?

sensay-nelson avatar Feb 03 '19 14:02 sensay-nelson

kube-state-metrics-6bdd878bd7

no pod is not listing as part of kubectl get pods -n kube-system

sakthishanmugam02 avatar Feb 03 '19 14:02 sakthishanmugam02

@sensay-nelson one update: i will check further; seems some permission issue kubectl get events -n kube-system -w LAST SEEN TYPE REASON KIND MESSAGE 19m Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found 61s Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found 3s Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found 6m29s Normal ScalingReplicaSet Deployment Scaled up replica set kube-state-metrics-6bdd878bd7 to 1 8s Normal ScalingReplicaSet Deployment Scaled up replica set kube-state-metrics-6bdd878bd7 to 1 0s Warning FailedCreate ReplicaSet Error creating: pods "kube-state-metrics-6bdd878bd7-" is forbidden: error looking up service account kube-system/prometheus: serviceaccount "prometheus" not found

sakthishanmugam02 avatar Feb 03 '19 15:02 sakthishanmugam02

i got it working; since namespace not specified in service account, it created in default; updated the namespace to kube-system; pod deployed @sensay-nelson thanks for your support

sakthishanmugam02 avatar Feb 03 '19 16:02 sakthishanmugam02

@sensay-nelson now prometheus server is up and running; how to configure grafana dashboard; i am getting Bad HTTP gateway error

sakthishanmugam02 avatar Feb 03 '19 16:02 sakthishanmugam02

@sensay-nelson how to setup data source and cluster? detailed steps please

sakthishanmugam02 avatar Feb 03 '19 19:02 sakthishanmugam02

http://docs.grafana.org/features/datasources/prometheus/

sensay-nelson avatar Feb 04 '19 05:02 sensay-nelson

http://docs.grafana.org/features/datasources/prometheus/ On Mon, Feb 4, 2019 at 2:56 AM sakthishanmugam02 @.***> wrote: @sensay-nelson https://github.com/sensay-nelson how to setup data source ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#58 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AiYw9PLRydodeiF718tNmYZ07Dsrz6p0ks5vJz7SgaJpZM4aVKo7 . -- Sense Director of Technical Operations [email protected] 818-220-1300

I added data source as prometheus with :30690 NodePort ip;
selected CA auth: and gave the certificate details what is there in promtheus configmap; and skip TLS verify (without these 2 option also tried) message: Data source is working

Setup the New cluster and have chosen the created data source; but there are no metrics; all metrics shows as NA and no node and namesapace details are listing; Unexpected error pop-up came in between

also following pop up Templating init failed Cannot read property 'length' of undefined

sakthishanmugam02 avatar Feb 04 '19 06:02 sakthishanmugam02

@sensay-nelson update: some progress;

I am able to see metrics now; but no pod level metrics... any idea? I 'm running cluster using kubeadm; your previous post explained something about kubeadm and kops; could you pls elaborate? image

sakthishanmugam02 avatar Feb 04 '19 06:02 sakthishanmugam02

thanks a lot; i have changed the configmap 'replacement' properties; it started working :)

sakthishanmugam02 avatar Feb 04 '19 07:02 sakthishanmugam02