dcgm-exporter
dcgm-exporter copied to clipboard
Getting "Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods",I am not getting DCGM_FI_DEV_GPU_UTIL metrics from prometheus
Ask your question
I have installed prometheus stack, prometheus adapter and dcgm exporter, but when i am trying to get this metrics it is giving below error
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/DCGM_FI_DEV_GPU_UTIL" | jq . Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods
What I am doing, I have 2 node groups in EKS, one is normal EC2 instance group which doesnt have GPUs, and on this node I have installed prometheus stack and prometheus adapter and I have GPU node group on which I have installed dcgm exporter.
Is this is due to this? means I should install all components on GPU node only then it will work?