strimzi-kafka-operator
strimzi-kafka-operator copied to clipboard
Fix/dashboards ds prometheus
Type of change
Select the type of your PR
- Bugfix
Description
When trying to add the Grafana dashboards programatically (via helm chart) this error appears:
Failed to upgrade legacy queries Datasource ${DS_PROMETHEUS} was not found
Seems it's related to this issue.
Checklist
Please go through this checklist and make sure all applicable tasks have been done
Tested on EKS. No visual changes done.
Can one of the admins verify this patch?
First if all: if I manually add the .json files to Grafana everything works fine..but I'm working on automatic the process of adding the dashboards as well. I'm using kube-prometheus-stack to deploy Prometheus/Grafana via Terraform+Terragrunt. Config looks like:
....
values_dashboard_kafka = <<VALUES
grafana:
dashboards:
default:
strimzi-kafka-exporter:
url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka-exporter.json
strimzi-zookeeper:
url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-zookeeper.json
strimzi-kafka:
url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka.json
VALUES
Then helm_release looks like:
resource "helm_release" "kube-prometheus-stack" {
count = local.kube-prometheus-stack["enabled"] ? 1 : 0
repository = local.kube-prometheus-stack["repository"]
name = local.kube-prometheus-stack["name"]
chart = local.kube-prometheus-stack["chart"]
version = local.kube-prometheus-stack["chart_version"]
timeout = local.kube-prometheus-stack["timeout"]
force_update = local.kube-prometheus-stack["force_update"]
recreate_pods = local.kube-prometheus-stack["recreate_pods"]
wait = local.kube-prometheus-stack["wait"]
atomic = local.kube-prometheus-stack["atomic"]
cleanup_on_fail = local.kube-prometheus-stack["cleanup_on_fail"]
dependency_update = local.kube-prometheus-stack["dependency_update"]
disable_crd_hooks = local.kube-prometheus-stack["disable_crd_hooks"]
disable_webhooks = local.kube-prometheus-stack["disable_webhooks"]
render_subchart_notes = local.kube-prometheus-stack["render_subchart_notes"]
replace = local.kube-prometheus-stack["replace"]
reset_values = local.kube-prometheus-stack["reset_values"]
reuse_values = local.kube-prometheus-stack["reuse_values"]
skip_crds = local.kube-prometheus-stack["skip_crds"]
verify = local.kube-prometheus-stack["verify"]
values = compact([
local.values_kube-prometheus-stack,
local.cert-manager["enabled"] ? local.values_dashboard_cert-manager : null,
local.cluster-autoscaler["enabled"] ? local.values_dashboard_cluster-autoscaler : null,
local.ingress-nginx["enabled"] ? local.values_dashboard_ingress-nginx : null,
local.thanos["enabled"] ? local.values_dashboard_thanos : null,
local.argo-cd["enabled"] ? local.values_dashboard_argo : null,
local.strimzi["enabled"] ? local.values_dashboard_kafka : null,
local.values_dashboard_node_exporter,
local.kube-prometheus-stack["thanos_sidecar_enabled"] ? local.values_thanos_sidecar : null,
local.kube-prometheus-stack["thanos_sidecar_enabled"] ? local.values_grafana_ds : null,
local.kube-prometheus-stack["default_global_requests"] ? local.values_kps_global_requests : null,
local.kube-prometheus-stack["default_global_limits"] ? local.values_kps_global_limits : null,
local.kube-prometheus-stack["extra_values"]
])
namespace = kubernetes_namespace.kube-prometheus-stack.*.metadata.0.name[count.index]
depends_on = [
helm_release.ingress-nginx,
kubectl_manifest.prometheus-operator_crds
]
}
Ok, so once I deploy this..dashboards are also added automatically..but when I try to load them in the browser I get these errors from this image.
A lot of people complain about this with different other .json. PR contains workaround for this issue.
@scholzj - so, I should do these changes only in the packaging/examples directory and leave examples as is?
@scholzj - so, I should do these changes only in the packaging/examples directory and leave examples as is?
Yes, correct. The examples
folder is updated only during a release.
A lot of people complain about this with different other .json. PR contains workaround for this issue.
Well, I'm not questioning that it fixes the issue for you. I'm questioning what it does to other users. That is why it would be good to understand what exactly you are adding and why is it the right way.
The dashabords for example have this today:
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "Prometheus",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
]
How does it fit with it? It looks very similar.
PS: I use the Grafana Operator (https://operatorhub.io/operator/grafana-operator) which does nto seem to have any isssues to load the dashboards automatically.
I've pushed the changes to the packaging dir.
This guy had same issue as me: https://github.com/grafana-operator/grafana-operator/issues/75 It was solved on the operator ..probably that why it works for you.
Sorry, but I really think we need to understand what it does and why does it do it. When I load it as JSON, it adds the new input field which has some options which are IMHO unexpected -> with data sources which do not exist such as default
. So it seems a bit weird.
@strimzi/maintainers Anyone understands Grafana enough to know what this actually does?
Tbh I don't know exactly what it does, I just noticed that the PR is reverting back the latest changes on the kafka exporter dashboard about showing N/A
instead of 0
when no data are available. This is wrong.
This change adds a new variable that can be referenced in the dashboard json. It also adds a new filter field to the dashboard in grafana:
![image](https://user-images.githubusercontent.com/9933303/188473631-86033b47-d273-4585-a37f-a3ac6b4a96bb.png)
For example, here we added cluster
to be able to pick between different datasources. I remember, that we had to remove the whole _input
and datasource
parts when we started syncing dashboards via Argo (we basically used default datasource).
Edit: I did basically the same today in https://github.com/ExcelentProject/sokar/pull/37/ so from my POV it's a good idea to add it.
So I tried to also add it manually to Grafana and it's really weird because I as a user had to specify datasource and the new field kinda doesn't have any value because it doesn't change anything.
I was able to import it automatically with GrafanaDashboard, but again, if the user doesn't expect to load Strimzi metrics from multiple datasources and switch between them in the dashboard, it looks weird as well.
@giany have you tried to specify DS_PROMETHEUS
value through chart configuration? There is an option to specify datasource name in Grafana chart so I would expect it should work even with our current version of dashboards.
I think this is relevant to the conversation: https://github.com/grafana/grafana/issues/10786