grafana-dashboards-kubernetes icon indicating copy to clipboard operation
grafana-dashboards-kubernetes copied to clipboard

[bug] Broken panels on k8s-views-nodes

Open reefland opened this issue 6 months ago • 11 comments

Describe the bug

I've read both issues in the README about broken panels on k8s-views-nodes dashboard.

  • My nodes have fixed IP addresses so it's not 1st issues.
  • The kube_node_info{node} matches node_uname_info{nodename} values, so doesn't seem like 2nd issue:

Image

When reviewing the instance variable, the label filter for nodename has a regex filter which finds nothing. If I set it to be just $node -- all panels render fine.

This works great:

Image

This default value has broken panels:

Image

Suggestions?

Running K3s: v1.32.5+k3s1 on bare metal Ubuntu 25.04.

How to reproduce?

No response

Expected behavior

No response

Additional context

No response

reefland avatar Jun 04 '25 13:06 reefland

This is because your metrics are missing the cluster label. Your fix works because =~"" actually matches everything, not because there is a valid value for $node and the regex is broken. You must add a cluster label to your metrics and it will work properly.

uhthomas avatar Jun 04 '25 16:06 uhthomas

Following your advice, I added this to prometheus:

prometheusSpec:

    scrapeClasses:
      - default: true
        name: cluster-relabeling
        relabelings:
          - sourceLabels: [ __name__ ]
            regex: (.*)
            targetLabel: cluster
            replacement: k3s-prod
            action: replace

The cluster field is now populated, but many panels are still not working:

Image

The instance variable still does not return any matches:

Image

When I click [Run Query], button I get a popup:

Image

reefland avatar Jun 04 '25 17:06 reefland

Can you check that kube_node_info and node_uname_info definitely have the cluster label?

uhthomas avatar Jun 04 '25 18:06 uhthomas

Can you check that kube_node_info and node_uname_info definitely have the cluster label?

Image

reefland avatar Jun 04 '25 18:06 reefland

And also to confirm we're using the same dashboard revision:

    kubernetes-nodes:
      # renovate: depName="Kubernetes / Views / Nodes"
      gnetId: 15759
      revision: 35
      datasource: Prometheus

reefland avatar Jun 04 '25 18:06 reefland

I think I'm having a similar issue, however I believe mine is due to the escaping of the backslash in the regex. If I change the regex used in the instance variable to (?i:($node)(\\.[a-z0-9.]+)?)(add an extra backslash) my query works fine, otherwise I'm spammed with warnings/errors regarding 1:27: parse error: unknown escape sequence U+002E '.'

Can you check that kube_node_info and node_uname_info definitely have the cluster label?

Image

What are the current values of the instance label on your node_uname_info metric? I was having issues with it not being aligned with the metrics being used in the panels, i.e. node_cpu_seconds_total.

edit: apologies for the noise, the dashboard works properly for me when using the raw url https://raw.githubusercontent.com/dotdc/grafana-dashboards-kubernetes/master/dashboards/k8s-views-nodes.json, but I do have an issue regarding backslash escaping when using gnetID 15759 revision 35. Inspecting the JSON from the dashboard fetched from grafana instead of the main gh branch shows that the variable isn't properly escaped:

      {
        "current": {
          "text": "192.168.20.101",
          "value": "192.168.20.101"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${datasource}"
        },
        "definition": "label_values(node_uname_info{nodename=~\"(?i:($node)(\\.[a-z0-9.]+)?)\"}, instance)",
        "hide": 2,
        "includeAll": false,
        "name": "instance",
        "options": [],
        "query": {
          "query": "label_values(node_uname_info{nodename=~\"(?i:($node)(\\.[a-z0-9.]+)?)\"}, instance)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "sort": 1,
        "type": "query"
      }
dashboards:
  kubernetes:
    kubernetes-nodes:
      gnetId: 15759
      revision: 35
      datasource: Prometheus

jasonpanosso avatar Jun 04 '25 18:06 jasonpanosso

I think I'm having a similar issue, however I believe mine is due to the escaping of the backslash in the regex. If I change the regex used in the instance variable to (?i:($node)(\\.[a-z0-9.]+)?)(add an extra backslash) my query works fine.

This is it, I also added an additional slash (?i:($node)(\\.[a-z0-9.]+)?) and the query works fine and all panels start working again.

reefland avatar Jun 04 '25 19:06 reefland

That was fixed though?

https://github.com/dotdc/grafana-dashboards-kubernetes/releases/tag/v2.7.3

uhthomas avatar Jun 04 '25 19:06 uhthomas

That was fixed though?

https://github.com/dotdc/grafana-dashboards-kubernetes/releases/tag/v2.7.3

it's fixed for the repo, but I think the uploaded version on grafana.com is out of sync, try this:

curl -s https://grafana.com/api/dashboards/15759/revisions/latest/download | jq -r '.templating.list[] | select(.name=="instance").definition'

jasonpanosso avatar Jun 04 '25 19:06 jasonpanosso

dit: apologies for the noise, the dashboard works properly for me when using the raw url https://raw.githubusercontent.com/dotdc/grafana-dashboards-kubernetes/master/dashboards/k8s-views-nodes.json, but I do have an issue regarding backslash escaping when using gnetID 15759 revision 35. Inspecting the JSON from the dashboard fetched from grafana instead of the main gh branch shows that the variable isn't properly escaped:

$ curl -s https://grafana.com/api/dashboards/15759/revisions/35/download | jq -r '.templating.list[] | select(.name=="instance").definition'

label_values(node_uname_info{nodename=~"(?i:($node)(\.[a-z0-9.]+)?)"}, instance)

So it looks like @dotdc just has to push the updated dashboard to Grafana?

reefland avatar Jun 04 '25 19:06 reefland

so, not yet pushed to grafana ?

fmiqbal avatar Jun 24 '25 02:06 fmiqbal

Just merged https://github.com/dotdc/grafana-dashboards-kubernetes/pull/156 to remove all escapes leaving just the ., this should match any characters. Dashboard has been pushed on grafana.com (v37) Let me know if you still have issues.

dotdc avatar Jun 30 '25 21:06 dotdc

@dotdc I don't think this is what was intended? The original change wanted to match a literal '.', not just everything. My understanding is that the latest version to fix it was not published to grafana.com and #156 was not necessary.

uhthomas avatar Jul 01 '25 04:07 uhthomas

Just applied the versions pushed to Grafana and everything is rendering again for me.

reefland avatar Jul 01 '25 12:07 reefland

Yeah I have no doubt it'll work for the general case, but the feature added for node names with dots in won't work anymore.

uhthomas avatar Jul 01 '25 12:07 uhthomas

I join this issue, but with another problem that is obvious, but for some reason it has not been fixed yet. Please add hyphen (-) support to nodename regex. Current regex excludes valid node names containing hyphens. Fails to match nodes like db-node.prod-cluster. Hyphens are RFC 1123-compliant in hostnames.

tentakle avatar Aug 03 '25 13:08 tentakle

@tentakle you could open a new issue and then open a PR :) Saying that an issue is trivial and complaining that this is not fixed yet is not a great attitude in opensource. 🙏

JordanP avatar Oct 01 '25 19:10 JordanP

Sorry about the delay, I was focused on other projects. Just pushed https://github.com/dotdc/grafana-dashboards-kubernetes/commit/ab1d6d759a405820322c30e6840528e9404f9978 to revert https://github.com/dotdc/grafana-dashboards-kubernetes/pull/156 and added - support to the regex.

Dashboard has been pushed to grafana.com : https://grafana.com/api/dashboards/15759/revisions/39/download

@uhthomas @reefland @jasonpanosso @tentakle Could you confirm it now works for all of you?

dotdc avatar Nov 01 '25 06:11 dotdc