redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

grafana dashboards: don't require namespace to be "kafka"

Open flokli opened this issue 4 years ago • 10 comments

There's nothing else in the dashboards requiring that metric to have this label.

This broke that part of the dashboard for a redpanda cluster we deployed in a namespace != "kafka".

Initially introduced in 68e21f3edd2d48f07030bf7f4680c1572ae0ada4.

flokli avatar Nov 20 '21 08:11 flokli

is there any concern that this will now count all partitions in the cluster including those from internal storage partitions?

dotnwat avatar Nov 22 '21 18:11 dotnwat

Namespace is the k8s namespace, no?

flokli avatar Nov 22 '21 18:11 flokli

Namespace is the k8s namespace, no?

I don't think so. I think the kafka namespace being changed in this PR is the namespace we use internally in our prometheus metrics labels. @twmb @0x5d am I way off base here?

dotnwat avatar Nov 22 '21 20:11 dotnwat

Namespace is the k8s namespace, no?

I don't think so. I think the kafka namespace being changed in this PR is the namespace we use internally in our prometheus metrics labels. @twmb @0x5d am I way off base here?

@dotnwat is right, this is an internal label. We could look into changing the label's name if it causes friction with kubernetes deployments.

That said, I guess the quickest solution is to rename the label in the prometheus scrape config to something different ( or changing the k8s namespace label, but I'm sure that is used more in most clusters :) )

0xdiba avatar Nov 23 '21 08:11 0xdiba

This broke that part of the dashboard for a redpanda cluster we deployed in a namespace != "kafka".

@flokli can you provide some detail on how it broke and why this fixes it?

dotnwat avatar Nov 23 '21 19:11 dotnwat

That said, I guess the quickest solution is to rename the label in the prometheus scrape config to something different ( or changing the k8s namespace label, but I'm sure that is used more in most clusters :) )

Yeah, I'd say namespace is pretty much a reserved word, and in k8s environments used for the namespace of the pod that's being scraped - at least when using grafana-agent-operator and prometheus-operator. From the config there:

              relabel_configs:
                - source_labels:
                    - job
                  target_label: __tmp_prometheus_job_name
                - action: keep
                  regex: app-foo
                  source_labels:
                    - __meta_kubernetes_pod_label_name
                - action: keep
                  regex: http
                  source_labels:
                    - __meta_kubernetes_pod_container_port_name
                - source_labels:
                    - __meta_kubernetes_namespace
                  target_label: namespace
                - source_labels:
                    - __meta_kubernetes_service_name
                  target_label: service
                - source_labels:
                    - __meta_kubernetes_pod_name
                  target_label: pod
                - source_labels:
                    - __meta_kubernetes_pod_container_name
                  target_label: container
                - regex: (.+)
                  replacement: $1
                  source_labels:
                    - __meta_kubernetes_pod_label_app
                  target_label: app
                - regex: (.+)
                  replacement: $1
                  source_labels:
                    - __meta_kubernetes_pod_label_name
                  target_label: name
                - replacement: default/app-foo-metrics
                  target_label: job
                - replacement: http
                  target_label: endpoint
                - action: hashmod
                  modulus: 1
                  source_labels:
                    - __address__
                  target_label: __tmp_hash
                - action: keep
                  regex: 0
                  source_labels:
                    - __tmp_hash

There's a lot of kubernetes-specific dashboards out there, also making use of that convention - so it's not very likely to change any time soon.

If the dashboard needs to filter vectorized_storage_log_partition_size to only get those metrics, maybe renaming that label inside redpanda to be something not clashing with these label names (kafka_namespace maybe) would be the right call?

Instead of changing it in the source code, we could use relabelling in the prometheus scrape config to rename it, yes, but given redpanda-operator doesn't install the scrape configs, only the other helm chart, and given there's also people running redpanda outside k8s that scrape redpanda metrics manually (who wouldn't be aware of any relabelling they need to do), that'd mean their dashboards would need to look different.

This should probably be changed in redpanda itself.

flokli avatar Nov 24 '21 09:11 flokli

Thanks for the detail @flokli. this is definitely out of my area of expertise so I'll defer further to @0xdiba @twmb @0x5d. I guess my only concern is if we need to create a plan for upgrading deployed grafana dashboards after the namespace changes.

dotnwat avatar Nov 24 '21 19:11 dotnwat

Any update?

flokli avatar Dec 02 '21 13:12 flokli

Hey @flokli, we are discussing on what a migration could look like and how breaking a change would be for this.

0xdiba avatar Dec 02 '21 15:12 0xdiba

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Mar 02 '22 16:03 CLAassistant

Grafana dashboards are now maintained in the redpanda-data/observability repo

twmb avatar Apr 18 '23 18:04 twmb