strimzi-kafka-operator Fix/dashboards ds prometheus

Fix/dashboards ds prometheus

Open giany opened this issue 1 year ago • 10 comments

Type of change

Select the type of your PR

Bugfix

Description

When trying to add the Grafana dashboards programatically (via helm chart) this error appears:

Failed to upgrade legacy queries Datasource ${DS_PROMETHEUS} was not found

Seems it's related to this issue.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

Tested on EKS. No visual changes done.

Aug 26 '22 13:08 giany

Can one of the admins verify this patch?

Aug 26 '22 13:08 strimzi-ci

First if all: if I manually add the .json files to Grafana everything works fine..but I'm working on automatic the process of adding the dashboards as well. I'm using kube-prometheus-stack to deploy Prometheus/Grafana via Terraform+Terragrunt. Config looks like:

....
  values_dashboard_kafka = <<VALUES
grafana:
  dashboards:
    default:
      strimzi-kafka-exporter:
        url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka-exporter.json
      strimzi-zookeeper:
        url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-zookeeper.json
      strimzi-kafka:
        url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka.json
VALUES

Then helm_release looks like:

resource "helm_release" "kube-prometheus-stack" {
  count                 = local.kube-prometheus-stack["enabled"] ? 1 : 0
  repository            = local.kube-prometheus-stack["repository"]
  name                  = local.kube-prometheus-stack["name"]
  chart                 = local.kube-prometheus-stack["chart"]
  version               = local.kube-prometheus-stack["chart_version"]
  timeout               = local.kube-prometheus-stack["timeout"]
  force_update          = local.kube-prometheus-stack["force_update"]
  recreate_pods         = local.kube-prometheus-stack["recreate_pods"]
  wait                  = local.kube-prometheus-stack["wait"]
  atomic                = local.kube-prometheus-stack["atomic"]
  cleanup_on_fail       = local.kube-prometheus-stack["cleanup_on_fail"]
  dependency_update     = local.kube-prometheus-stack["dependency_update"]
  disable_crd_hooks     = local.kube-prometheus-stack["disable_crd_hooks"]
  disable_webhooks      = local.kube-prometheus-stack["disable_webhooks"]
  render_subchart_notes = local.kube-prometheus-stack["render_subchart_notes"]
  replace               = local.kube-prometheus-stack["replace"]
  reset_values          = local.kube-prometheus-stack["reset_values"]
  reuse_values          = local.kube-prometheus-stack["reuse_values"]
  skip_crds             = local.kube-prometheus-stack["skip_crds"]
  verify                = local.kube-prometheus-stack["verify"]
  values = compact([
    local.values_kube-prometheus-stack,
    local.cert-manager["enabled"] ? local.values_dashboard_cert-manager : null,
    local.cluster-autoscaler["enabled"] ? local.values_dashboard_cluster-autoscaler : null,
    local.ingress-nginx["enabled"] ? local.values_dashboard_ingress-nginx : null,
    local.thanos["enabled"] ? local.values_dashboard_thanos : null,
    local.argo-cd["enabled"] ? local.values_dashboard_argo : null,
    local.strimzi["enabled"] ? local.values_dashboard_kafka : null,
    local.values_dashboard_node_exporter,
    local.kube-prometheus-stack["thanos_sidecar_enabled"] ? local.values_thanos_sidecar : null,
    local.kube-prometheus-stack["thanos_sidecar_enabled"] ? local.values_grafana_ds : null,
    local.kube-prometheus-stack["default_global_requests"] ? local.values_kps_global_requests : null,
    local.kube-prometheus-stack["default_global_limits"] ? local.values_kps_global_limits : null,
    local.kube-prometheus-stack["extra_values"]
  ])
  namespace = kubernetes_namespace.kube-prometheus-stack.*.metadata.0.name[count.index]

  depends_on = [    
    helm_release.ingress-nginx,
    kubectl_manifest.prometheus-operator_crds
  ]
}

Ok, so once I deploy this..dashboards are also added automatically..but when I try to load them in the browser I get these errors from this image. Screenshot 2022-08-26 at 16 29 50

A lot of people complain about this with different other .json. PR contains workaround for this issue.

Aug 26 '22 13:08 giany

@scholzj - so, I should do these changes only in the packaging/examples directory and leave examples as is?

Aug 26 '22 13:08 giany

@scholzj - so, I should do these changes only in the packaging/examples directory and leave examples as is?

Yes, correct. The examples folder is updated only during a release.

A lot of people complain about this with different other .json. PR contains workaround for this issue.

Well, I'm not questioning that it fixes the issue for you. I'm questioning what it does to other users. That is why it would be good to understand what exactly you are adding and why is it the right way.

The dashabords for example have this today:

  "__inputs": [
    {
      "name": "DS_PROMETHEUS",
      "label": "Prometheus",
      "description": "",
      "type": "datasource",
      "pluginId": "prometheus",
      "pluginName": "Prometheus"
    }
  ]

How does it fit with it? It looks very similar.

PS: I use the Grafana Operator (https://operatorhub.io/operator/grafana-operator) which does nto seem to have any isssues to load the dashboards automatically.

Aug 26 '22 13:08 scholzj

I've pushed the changes to the packaging dir.

This guy had same issue as me: https://github.com/grafana-operator/grafana-operator/issues/75 It was solved on the operator ..probably that why it works for you.

Aug 26 '22 14:08 giany

Sorry, but I really think we need to understand what it does and why does it do it. When I load it as JSON, it adds the new input field which has some options which are IMHO unexpected -> with data sources which do not exist such as default. So it seems a bit weird.

Aug 26 '22 14:08 scholzj

@strimzi/maintainers Anyone understands Grafana enough to know what this actually does?

Aug 26 '22 14:08 scholzj

Tbh I don't know exactly what it does, I just noticed that the PR is reverting back the latest changes on the kafka exporter dashboard about showing N/A instead of 0 when no data are available. This is wrong.

Sep 05 '22 10:09 ppatierno

This change adds a new variable that can be referenced in the dashboard json. It also adds a new filter field to the dashboard in grafana:

For example, here we added cluster to be able to pick between different datasources. I remember, that we had to remove the whole _input and datasource parts when we started syncing dashboards via Argo (we basically used default datasource).

Edit: I did basically the same today in https://github.com/ExcelentProject/sokar/pull/37/ so from my POV it's a good idea to add it.

Sep 05 '22 14:09 Frawless

So I tried to also add it manually to Grafana and it's really weird because I as a user had to specify datasource and the new field kinda doesn't have any value because it doesn't change anything.

I was able to import it automatically with GrafanaDashboard, but again, if the user doesn't expect to load Strimzi metrics from multiple datasources and switch between them in the dashboard, it looks weird as well.

@giany have you tried to specify DS_PROMETHEUS value through chart configuration? There is an option to specify datasource name in Grafana chart so I would expect it should work even with our current version of dashboards.

Sep 10 '22 13:09 Frawless

I think this is relevant to the conversation: https://github.com/grafana/grafana/issues/10786

Sep 29 '22 14:09 SuitableProduce

strimzi-kafka-operator strimzi-kafka-operator copied to clipboard

Fix/dashboards ds prometheus

Type of change

Description

Checklist

strimzi-kafka-operator
strimzi-kafka-operator copied to clipboard