hyperdx icon indicating copy to clipboard operation
hyperdx copied to clipboard

[Bug] CPU and Events data missing on Kubernetes Dashboard

Open galvesribeiro opened this issue 3 months ago • 6 comments

Hello! We've just deployed Clickstack on a brand new bare metal K8s cluster. We are using the helm chart with the hyperdx and Otel enabled (we already have an external ClickHouse).

After the deployment of the chart, and installed the cluster and node level Otel collectors as per documentation, the data seems to be flowing. However, we have a few weird things which seems to me something is wrong:

  1. In all the Kubernetes dashboard tabs (pods, nodes and namespace), the "CPU Usage" charts are empty (Memory Usage works fine).
Image Image Image
  1. Also, in the second picture you may see that all 3 nodes have Status "Not Ready", CPU N/A, when clearly the nodes are healthy and working.
  2. On the Pod details and on the Pods tab, they show no events, when we clearly have events as shown by kubectl and on other tools like Lens.
Image Image

Is there anything we are missing for configuring the kubernetes data besides what was mentioned in the docs?

Here are the settings applied to each helm chart:

  • clickstack:
clickhouse:
  enabled: false
global:
  keepPVC: true
  storageClassName: data-block-sc
hyperdx:
  env:
  - name: NODE_EXTRA_CA_CERTS
    value: /etc/ssl/certs/ca-certificates.crt
  existingConfigSecret: clickstack-hyperdx-config
  frontendUrl: https://<redacted>
  image:
    tag: 2.5.0
  podDisruptionBudget:
    enabled: true
  replicas: 3
  useExistingConfigSecret: true
mongodb:
  enabled: true
  image: mongo:8.0.14-noble
  persistence:
    dataSize: 20Gi
    enabled: true
    storageClass: data-block-sc
  port: 27017
otel:
  clickhouseDatabase: clickstack
  clickhouseEndpoint: clickhouse://<host>.clickhouse.svc.cluster.local:9440?secure=true
  clickhousePassword: <redacted>
  clickhouseUser: <redacted>
  enabled: true
  image:
    tag: 2.5.0
  replicas: 3

Otel collector - cluster deployment:

config:
  exporters:
    otlphttp:
      compression: gzip
      endpoint: http://clickstack-hdx-oss-v2-otel-collector:4318
      headers:
        authorization: ${env:API_KEY}
  service:
    pipelines:
      logs:
        exporters:
        - otlphttp
      metrics:
        exporters:
        - otlphttp
extraEnvs:
- name: API_KEY
  valueFrom:
    secretKeyRef:
      key: API_KEY
      name: ingestion-api-key
      optional: true
image:
  repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib
  tag: 0.136.0
mode: deployment
presets:
  clusterMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
    extractAllPodAnnotations: true
    extractAllPodLabels: true
  kubernetesEvents:
    enabled: true
replicaCount: 1

Otel collector - Node daemonset:

clusterRole:
  create: true
  rules:
  - apiGroups:
    - ""
    resources:
    - nodes/proxy
    - pods
    - nodes
    - namespaces
    verbs:
    - get
    - list
    - watch
config:
  exporters:
    otlphttp:
      compression: gzip
      endpoint: http://clickstack-hdx-oss-v2-otel-collector:4318
      headers:
        authorization: ${env:API_KEY}
  receivers:
    kubeletstats:
      auth_type: serviceAccount
      collection_interval: 20s
      endpoint: ${env:K8S_NODE_NAME}:10250
      insecure_skip_verify: true
      metrics:
        container.uptime:
          enabled: true
        k8s.container.cpu_limit_utilization:
          enabled: true
        k8s.container.cpu_request_utilization:
          enabled: true
        k8s.container.memory_limit_utilization:
          enabled: true
        k8s.container.memory_request_utilization:
          enabled: true
        k8s.node.uptime:
          enabled: true
        k8s.pod.cpu_limit_utilization:
          enabled: true
        k8s.pod.cpu_request_utilization:
          enabled: true
        k8s.pod.memory_limit_utilization:
          enabled: true
        k8s.pod.memory_request_utilization:
          enabled: true
        k8s.pod.uptime:
          enabled: true
  service:
    pipelines:
      logs:
        exporters:
        - otlphttp
      metrics:
        exporters:
        - otlphttp
extraEnvs:
- name: API_KEY
  valueFrom:
    secretKeyRef:
      key: API_KEY
      name: ingestion-api-key
      optional: true
image:
  repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib
  tag: 0.136.0
mode: daemonset
presets:
  hostMetrics:
    enabled: true
  kubeletMetrics:
    enabled: true
  kubernetesAttributes:
    enabled: true
    extractAllPodAnnotations: true
    extractAllPodLabels: true
  logsCollection:
    enabled: true

Any help would be appreciated.

Thanks!

galvesribeiro avatar Sep 28 '25 16:09 galvesribeiro

The issue is probably that the metric for cpu usage was renamed from cpu.utilization to cpu.usage. It should be a fairly simple change, but it will be a breaking one

jorgeparavicini avatar Oct 06 '25 15:10 jorgeparavicini

Ouch. Nonetheless - It is weird that this was renamed in 2023 and the ClickStack/HyperDX is not updated as of Today.

We are starting to have concerns about moving forward with it since it seems that basic things are not "clicking" on it, and it doesnt seems like the product team pay attention to this repo since this bug is one week old report and nobody said a word.

Concerning.

galvesribeiro avatar Oct 06 '25 16:10 galvesribeiro

Just fork the repo and update the metrics, we are building our own custom image to fix that issue. Events works, just need to make sure that you are setting k8s events in the collector

mlsalcedo avatar Oct 16 '25 21:10 mlsalcedo

Thanks @jorgeparavicini for jumping in.

@galvesribeiro Thanks for the issue report. We are aware of this and fixed it in https://github.com/hyperdxio/hyperdx/pull/1248

Apologies for the delay in response. We are a small team with a lot to do :)

teeohhem avatar Nov 03 '25 15:11 teeohhem

Thanks @jorgeparavicini for jumping in.

@galvesribeiro Thanks for the issue report. We are aware of this and fixed it in #1248

Apologies for the delay in response. We are a small team with a lot to do :)

No problem at all! I'm glad it was fixed, thank you!

Which version should we expect this to go into?

galvesribeiro avatar Nov 03 '25 15:11 galvesribeiro

@galvesribeiro I believe you can try out v2.7.1

wrn14897 avatar Nov 03 '25 18:11 wrn14897