[Bug] CPU and Events data missing on Kubernetes Dashboard
Hello! We've just deployed Clickstack on a brand new bare metal K8s cluster. We are using the helm chart with the hyperdx and Otel enabled (we already have an external ClickHouse).
After the deployment of the chart, and installed the cluster and node level Otel collectors as per documentation, the data seems to be flowing. However, we have a few weird things which seems to me something is wrong:
- In all the Kubernetes dashboard tabs (pods, nodes and namespace), the "CPU Usage" charts are empty (Memory Usage works fine).
- Also, in the second picture you may see that all 3 nodes have Status "Not Ready", CPU N/A, when clearly the nodes are healthy and working.
- On the Pod details and on the Pods tab, they show no events, when we clearly have events as shown by kubectl and on other tools like Lens.
Is there anything we are missing for configuring the kubernetes data besides what was mentioned in the docs?
Here are the settings applied to each helm chart:
- clickstack:
clickhouse:
enabled: false
global:
keepPVC: true
storageClassName: data-block-sc
hyperdx:
env:
- name: NODE_EXTRA_CA_CERTS
value: /etc/ssl/certs/ca-certificates.crt
existingConfigSecret: clickstack-hyperdx-config
frontendUrl: https://<redacted>
image:
tag: 2.5.0
podDisruptionBudget:
enabled: true
replicas: 3
useExistingConfigSecret: true
mongodb:
enabled: true
image: mongo:8.0.14-noble
persistence:
dataSize: 20Gi
enabled: true
storageClass: data-block-sc
port: 27017
otel:
clickhouseDatabase: clickstack
clickhouseEndpoint: clickhouse://<host>.clickhouse.svc.cluster.local:9440?secure=true
clickhousePassword: <redacted>
clickhouseUser: <redacted>
enabled: true
image:
tag: 2.5.0
replicas: 3
Otel collector - cluster deployment:
config:
exporters:
otlphttp:
compression: gzip
endpoint: http://clickstack-hdx-oss-v2-otel-collector:4318
headers:
authorization: ${env:API_KEY}
service:
pipelines:
logs:
exporters:
- otlphttp
metrics:
exporters:
- otlphttp
extraEnvs:
- name: API_KEY
valueFrom:
secretKeyRef:
key: API_KEY
name: ingestion-api-key
optional: true
image:
repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib
tag: 0.136.0
mode: deployment
presets:
clusterMetrics:
enabled: true
kubernetesAttributes:
enabled: true
extractAllPodAnnotations: true
extractAllPodLabels: true
kubernetesEvents:
enabled: true
replicaCount: 1
Otel collector - Node daemonset:
clusterRole:
create: true
rules:
- apiGroups:
- ""
resources:
- nodes/proxy
- pods
- nodes
- namespaces
verbs:
- get
- list
- watch
config:
exporters:
otlphttp:
compression: gzip
endpoint: http://clickstack-hdx-oss-v2-otel-collector:4318
headers:
authorization: ${env:API_KEY}
receivers:
kubeletstats:
auth_type: serviceAccount
collection_interval: 20s
endpoint: ${env:K8S_NODE_NAME}:10250
insecure_skip_verify: true
metrics:
container.uptime:
enabled: true
k8s.container.cpu_limit_utilization:
enabled: true
k8s.container.cpu_request_utilization:
enabled: true
k8s.container.memory_limit_utilization:
enabled: true
k8s.container.memory_request_utilization:
enabled: true
k8s.node.uptime:
enabled: true
k8s.pod.cpu_limit_utilization:
enabled: true
k8s.pod.cpu_request_utilization:
enabled: true
k8s.pod.memory_limit_utilization:
enabled: true
k8s.pod.memory_request_utilization:
enabled: true
k8s.pod.uptime:
enabled: true
service:
pipelines:
logs:
exporters:
- otlphttp
metrics:
exporters:
- otlphttp
extraEnvs:
- name: API_KEY
valueFrom:
secretKeyRef:
key: API_KEY
name: ingestion-api-key
optional: true
image:
repository: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib
tag: 0.136.0
mode: daemonset
presets:
hostMetrics:
enabled: true
kubeletMetrics:
enabled: true
kubernetesAttributes:
enabled: true
extractAllPodAnnotations: true
extractAllPodLabels: true
logsCollection:
enabled: true
Any help would be appreciated.
Thanks!
The issue is probably that the metric for cpu usage was renamed from cpu.utilization to cpu.usage. It should be a fairly simple change, but it will be a breaking one
Ouch. Nonetheless - It is weird that this was renamed in 2023 and the ClickStack/HyperDX is not updated as of Today.
We are starting to have concerns about moving forward with it since it seems that basic things are not "clicking" on it, and it doesnt seems like the product team pay attention to this repo since this bug is one week old report and nobody said a word.
Concerning.
Just fork the repo and update the metrics, we are building our own custom image to fix that issue. Events works, just need to make sure that you are setting k8s events in the collector
Thanks @jorgeparavicini for jumping in.
@galvesribeiro Thanks for the issue report. We are aware of this and fixed it in https://github.com/hyperdxio/hyperdx/pull/1248
Apologies for the delay in response. We are a small team with a lot to do :)
Thanks @jorgeparavicini for jumping in.
@galvesribeiro Thanks for the issue report. We are aware of this and fixed it in #1248
Apologies for the delay in response. We are a small team with a lot to do :)
No problem at all! I'm glad it was fixed, thank you!
Which version should we expect this to go into?
@galvesribeiro I believe you can try out v2.7.1