grafana-dashboards-kubernetes icon indicating copy to clipboard operation
grafana-dashboards-kubernetes copied to clipboard

Cluster variable

Open tesharp opened this issue 2 years ago • 9 comments

Thanks for very nice dashboards.

One thing missing is a variable "cluster" maybe. Having multiple clusters it is useful to limit scope to a single cluster. A multi-select variable accepting all and queries adding "cluster=~"$cluster".

tesharp avatar Jul 07 '22 13:07 tesharp

Hi @tesharp, you can achieve this by using the Datasource variable; just configure each of your clusters as datasources.

dotdc avatar Jul 09 '22 09:07 dotdc

Hi @dotdc , first of I really think this is great stuff. Came here from your Medium post, and I'm glad some of us show that FR people can do stuffs too :)

Anyway, even though you're right for Datasources, there is the use-case of people using a Prom federation, or who uses tools like Thanos (even Grafana Mimir but I haven't tried it) in order to achieve global monitoring of several clusters from the same point. In the case of Thanos - for which I'm familiar with - we get all the data in the same "bucket", therefore our Grafanas usually point to 1 DS to viz a several clusters.

I was about to open an issue before finding this one, I'm keen to propose a PR adding a cluster var and let you decide whether or not you want to integrate it, wdyt ?

cebidhem avatar Jul 11 '22 11:07 cebidhem

Thank you @cebidhem! I understand the need and also used the $cluster variable in the past. The problem is that the cluster label is not available by default, I guess you have a custom rule that adds it. If I add the cluster variable and label in my dashboards, they will not work without it, and will break the dashboards for most users.

I don't think you can create a PromQL query with an optional label, if any of you know a way of doing this, I'm really interested. If you have other ideas to solve this, we can discuss it here.

dotdc avatar Jul 12 '22 07:07 dotdc

It makes completely sense. I'll probably fork the project and add the variable in our project, at least for now.

cebidhem avatar Jul 15 '22 15:07 cebidhem

I'd be happy to include this feature if you make a tool or script that injects the necessary variables and labels to the dashboards.

This way, everyone will benefit from your work and it will also be easier for you to update.

dotdc avatar Jul 15 '22 16:07 dotdc

We could use GitHub Actions to publish a set of new dashboards, maybe prefixed with multicluster-, with a cluster variable in the queries ?

That way, it would be possible to have both types of dashboards in the same repo, and even publish both version on Grafana. But at the very least, they would be here to be consumed by the Grafana configuration as raw.

I've never really work with Actions as I'm mostly using GitLab, but I could give it a try.

I guess a quite simple working solution could be a shell script doing some sed.

cebidhem avatar Jul 15 '22 22:07 cebidhem

That was the idea, if you create a tool that can inject the labels & variables in the script, I will make a special release for them.

dotdc avatar Jul 18 '22 09:07 dotdc

@dotdc how about usage of jsonnet? i know that https://github.com/grafana/grafonnet-lib doesn't support grafana 9. but i can try to build dashboard generation on bare jsonnet, without grafonnet

I can prepare changes if you are fine with the approach

k1rk avatar Oct 03 '22 13:10 k1rk

Hi @k1rk,

The only requirement I have is that the source must remain flat JSON Grafana dashboards. If you can provide a script, in any language, that can generate a copy of the dashboards and inject the missing cluster label in them, it would be the perfect solution to me.

Let me know.

dotdc avatar Oct 03 '22 14:10 dotdc

Did anyone manage to script this? Would love to add the functionality for multi-cluster support.

Excellent dashboards btw, thanks!

keith-e-munro avatar Jul 06 '23 23:07 keith-e-munro

I have same issue, too..

pingping95 avatar Jul 07 '23 14:07 pingping95

Hi @keith-e-munro,

I don't think anyone did it yet, but we can discuss the topic further if you're willing to do it.

I personally use one datasource per cluster, even when using Thanos, so I don't have to rely on the cluster variable.

dotdc avatar Jul 07 '23 18:07 dotdc

Thank you @cebidhem! I understand the need and also used the $cluster variable in the past. The problem is that the cluster label is not available by default, I guess you have a custom rule that adds it. If I add the cluster variable and label in my dashboards, they will not work without it, and will break the dashboards for most users.

I don't think you can create a PromQL query with an optional label, if any of you know a way of doing this, I'm really interested. If you have other ideas to solve this, we can discuss it here.

First of all, very nice set of dashboards 👏 .

Not sure if I have all the context, but non-existing labels should not break the query, at least not in recent versions of Prometheus e.g. > 2.45.0 (could not find whether this was a fix at some point); so adding a variable with default empty value and a label like cluster="$cluster" in the dashboard queries will work since the metric value is returned in case the label is not defined {cluster=""}

mihaico avatar Aug 04 '23 09:08 mihaico

This is really interesting @mihaico, just tested on 2.46.0 and it works! I'm pretty sure it wasn't the case before, so I would be really interested to know when it changed, or if I just didn't see the elephant in the room :sweat_smile:

We could pick this solution if the change is old enough, otherwise I would wait a little because 2.45.0 will probably be a bit too recent for most users.

dotdc avatar Aug 04 '23 13:08 dotdc

I am not sure whether this is related, but I am experiencing issue that dashboard picks up non-kubernetes nodes (in this case openwrt node-exporter). It also messes up metrics like CPU usage, averaging them together as it would be an k8s node.

It incorrectly shows other non-kubernetes node. image

As I use kube-prometheus-stack I am adding the openwrt node-export with prom operator CRD. Could it be that it is picking it up as k8s node due it being defined as service endpoint?

kind: Service
apiVersion: v1
metadata:
  name: openwrt-prometheus-metrics
  labels:
    app: openwrt-prometheus-metrics
spec:
 type: ClusterIP
 ports:
 - name: metrics
   port: 9100
   targetPort: 9100
---
kind: Endpoints
apiVersion: v1
metadata:
 name: openwrt-prometheus-metrics
subsets:
 - addresses:
   - ip: 10.1.20.1
   ports: 
   - name: metrics
     port: 9100
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: openwrt-servicemonitor
spec:
  selector:
    matchLabels:
      app: openwrt-prometheus-metrics
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics
  namespaceSelector:
    any: true

AndrisJrs avatar Sep 13 '23 21:09 AndrisJrs

Hi @AndrisJrs,

There is no differentiating label for Kubernetes nodes here. I would recommend running a dedicated Prometheus instance for your cluster.

In order to keep the issue on topic, please open a dedicated issue if you want to discuss it further.

dotdc avatar Sep 16 '23 07:09 dotdc

image

@dotdc I retest this in Promtheus 2.0.0 (docker run --rm -ti -p9090:9090 prom/prometheus:v2.0.0) and cluster="" still works there. From my point of view, this should unlock the Feature Request

jkroepke avatar Oct 14 '23 13:10 jkroepke

Sorry for the delay, I know this has been asked for quite some time.

@jkroepke thank you for this input, I think it's safe enough to give it a shot. I'll convert the global view and share it beginning of next week so you can test and let me know if it works on your various setups.

dotdc avatar Dec 15 '23 15:12 dotdc

Just added the cluster variable in k8s-views-global.json in this PR : https://github.com/dotdc/grafana-dashboards-kubernetes/pull/78

If anyone has time to try it, and let me know if it works on your side.

dotdc avatar Dec 18 '23 10:12 dotdc

I give some feedback on PR

jkroepke avatar Dec 18 '23 11:12 jkroepke

Thank you @jkroepke, just made the according change :+1: Let me know if you notice anything else.

dotdc avatar Dec 18 '23 16:12 dotdc

Just did the namespaces, nodes and pods views (https://github.com/dotdc/grafana-dashboards-kubernetes/pull/82). I'll wait for more feedback before doing the same for the remaining dashboards.

dotdc avatar Dec 18 '23 22:12 dotdc

How to add additional label in Prometheus Operator? As "cluster" isn't default label. image

AndrisJrs avatar Dec 18 '23 22:12 AndrisJrs

How to add additional label in Prometheus Operator? As "cluster" isn't default label. image

Hi @AndrisJrs, you can add this label using static_configs or relabel_configs, but only if you need it. This addition was made for Thanos users (or similar) using a single datasource for multiple clusters (global queries). All PromQL queries should work without this label, let me know if it's not the case on your setup.

dotdc avatar Dec 18 '23 22:12 dotdc

Hi @doc,

I looked into #82. Since multi-value is dropped, the query can be optimized to cluster="$cluster" instead cluster=~"$cluster". Not sure if it really has an impact.

jkroepke avatar Dec 18 '23 22:12 jkroepke

True, made the change in #84

dotdc avatar Dec 19 '23 09:12 dotdc

Hi @dotdc, Thanks for the tip. It works fine without label. I wanted to see whether it will fix non-kubernetes node-exporters leaking into dashboard like I reported earlier in this thread. Works perfectly after adding relabelings. Thank you 👍

AndrisJrs avatar Dec 20 '23 18:12 AndrisJrs

Just merged https://github.com/dotdc/grafana-dashboards-kubernetes/pull/90 with the missing dashboards. :tada:

Please open a new issue if you see any bugs related to this.

Thank you all for your ideas !

dotdc avatar Jan 04 '24 10:01 dotdc

:tada: This issue has been resolved in version 1.1.0 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

dotdc avatar Apr 25 '24 21:04 dotdc