strimzi-kafka-operator icon indicating copy to clipboard operation
strimzi-kafka-operator copied to clipboard

Fix/dashboards ds prometheus

Open giany opened this issue 1 year ago • 10 comments

Type of change

Select the type of your PR

  • Bugfix

Description

When trying to add the Grafana dashboards programatically (via helm chart) this error appears:

Failed to upgrade legacy queries Datasource ${DS_PROMETHEUS} was not found

Seems it's related to this issue.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

Tested on EKS. No visual changes done.

giany avatar Aug 26 '22 13:08 giany

Can one of the admins verify this patch?

strimzi-ci avatar Aug 26 '22 13:08 strimzi-ci

First if all: if I manually add the .json files to Grafana everything works fine..but I'm working on automatic the process of adding the dashboards as well. I'm using kube-prometheus-stack to deploy Prometheus/Grafana via Terraform+Terragrunt. Config looks like:

....
  values_dashboard_kafka = <<VALUES
grafana:
  dashboards:
    default:
      strimzi-kafka-exporter:
        url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka-exporter.json
      strimzi-zookeeper:
        url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-zookeeper.json
      strimzi-kafka:
        url: https://raw.githubusercontent.com/strimzi/strimzi-kafka-operator/main/examples/metrics/grafana-dashboards/strimzi-kafka.json
VALUES

Then helm_release looks like:

resource "helm_release" "kube-prometheus-stack" {
  count                 = local.kube-prometheus-stack["enabled"] ? 1 : 0
  repository            = local.kube-prometheus-stack["repository"]
  name                  = local.kube-prometheus-stack["name"]
  chart                 = local.kube-prometheus-stack["chart"]
  version               = local.kube-prometheus-stack["chart_version"]
  timeout               = local.kube-prometheus-stack["timeout"]
  force_update          = local.kube-prometheus-stack["force_update"]
  recreate_pods         = local.kube-prometheus-stack["recreate_pods"]
  wait                  = local.kube-prometheus-stack["wait"]
  atomic                = local.kube-prometheus-stack["atomic"]
  cleanup_on_fail       = local.kube-prometheus-stack["cleanup_on_fail"]
  dependency_update     = local.kube-prometheus-stack["dependency_update"]
  disable_crd_hooks     = local.kube-prometheus-stack["disable_crd_hooks"]
  disable_webhooks      = local.kube-prometheus-stack["disable_webhooks"]
  render_subchart_notes = local.kube-prometheus-stack["render_subchart_notes"]
  replace               = local.kube-prometheus-stack["replace"]
  reset_values          = local.kube-prometheus-stack["reset_values"]
  reuse_values          = local.kube-prometheus-stack["reuse_values"]
  skip_crds             = local.kube-prometheus-stack["skip_crds"]
  verify                = local.kube-prometheus-stack["verify"]
  values = compact([
    local.values_kube-prometheus-stack,
    local.cert-manager["enabled"] ? local.values_dashboard_cert-manager : null,
    local.cluster-autoscaler["enabled"] ? local.values_dashboard_cluster-autoscaler : null,
    local.ingress-nginx["enabled"] ? local.values_dashboard_ingress-nginx : null,
    local.thanos["enabled"] ? local.values_dashboard_thanos : null,
    local.argo-cd["enabled"] ? local.values_dashboard_argo : null,
    local.strimzi["enabled"] ? local.values_dashboard_kafka : null,
    local.values_dashboard_node_exporter,
    local.kube-prometheus-stack["thanos_sidecar_enabled"] ? local.values_thanos_sidecar : null,
    local.kube-prometheus-stack["thanos_sidecar_enabled"] ? local.values_grafana_ds : null,
    local.kube-prometheus-stack["default_global_requests"] ? local.values_kps_global_requests : null,
    local.kube-prometheus-stack["default_global_limits"] ? local.values_kps_global_limits : null,
    local.kube-prometheus-stack["extra_values"]
  ])
  namespace = kubernetes_namespace.kube-prometheus-stack.*.metadata.0.name[count.index]

  depends_on = [    
    helm_release.ingress-nginx,
    kubectl_manifest.prometheus-operator_crds
  ]
}

Ok, so once I deploy this..dashboards are also added automatically..but when I try to load them in the browser I get these errors from this image. Screenshot 2022-08-26 at 16 29 50

A lot of people complain about this with different other .json. PR contains workaround for this issue.

giany avatar Aug 26 '22 13:08 giany

@scholzj - so, I should do these changes only in the packaging/examples directory and leave examples as is?

giany avatar Aug 26 '22 13:08 giany

@scholzj - so, I should do these changes only in the packaging/examples directory and leave examples as is?

Yes, correct. The examples folder is updated only during a release.

A lot of people complain about this with different other .json. PR contains workaround for this issue.

Well, I'm not questioning that it fixes the issue for you. I'm questioning what it does to other users. That is why it would be good to understand what exactly you are adding and why is it the right way.

The dashabords for example have this today:

  "__inputs": [
    {
      "name": "DS_PROMETHEUS",
      "label": "Prometheus",
      "description": "",
      "type": "datasource",
      "pluginId": "prometheus",
      "pluginName": "Prometheus"
    }
  ]

How does it fit with it? It looks very similar.


PS: I use the Grafana Operator (https://operatorhub.io/operator/grafana-operator) which does nto seem to have any isssues to load the dashboards automatically.

scholzj avatar Aug 26 '22 13:08 scholzj

I've pushed the changes to the packaging dir.

This guy had same issue as me: https://github.com/grafana-operator/grafana-operator/issues/75 It was solved on the operator ..probably that why it works for you.

giany avatar Aug 26 '22 14:08 giany

Sorry, but I really think we need to understand what it does and why does it do it. When I load it as JSON, it adds the new input field which has some options which are IMHO unexpected -> with data sources which do not exist such as default. So it seems a bit weird.

scholzj avatar Aug 26 '22 14:08 scholzj

@strimzi/maintainers Anyone understands Grafana enough to know what this actually does?

scholzj avatar Aug 26 '22 14:08 scholzj

Tbh I don't know exactly what it does, I just noticed that the PR is reverting back the latest changes on the kafka exporter dashboard about showing N/A instead of 0 when no data are available. This is wrong.

ppatierno avatar Sep 05 '22 10:09 ppatierno

This change adds a new variable that can be referenced in the dashboard json. It also adds a new filter field to the dashboard in grafana:

image

For example, here we added cluster to be able to pick between different datasources. I remember, that we had to remove the whole _input and datasource parts when we started syncing dashboards via Argo (we basically used default datasource).

Edit: I did basically the same today in https://github.com/ExcelentProject/sokar/pull/37/ so from my POV it's a good idea to add it.

Frawless avatar Sep 05 '22 14:09 Frawless

So I tried to also add it manually to Grafana and it's really weird because I as a user had to specify datasource and the new field kinda doesn't have any value because it doesn't change anything.

I was able to import it automatically with GrafanaDashboard, but again, if the user doesn't expect to load Strimzi metrics from multiple datasources and switch between them in the dashboard, it looks weird as well.

@giany have you tried to specify DS_PROMETHEUS value through chart configuration? There is an option to specify datasource name in Grafana chart so I would expect it should work even with our current version of dashboards.

Frawless avatar Sep 10 '22 13:09 Frawless

I think this is relevant to the conversation: https://github.com/grafana/grafana/issues/10786

SuitableProduce avatar Sep 29 '22 14:09 SuitableProduce