dcgm-exporter
dcgm-exporter copied to clipboard
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.20.0 to 0.23.0. Commits c48da13 http2: fix TestServerContinuationFlood flakes 762b58d http2: fix tipos in comment ba87210 http2: close connections when receiving too many headers ebc8168 all: fix...
Hello all , docker image :nvcr.io/nvidia/k8s/dcgm-exporter -3.2.6-3.1.9-ubi8 OS : Centos 8.4 cuda-12.2 Driver-535.104.05 data:image/s3,"s3://crabby-images/949c7/949c778a9f50c20c493c5f1090701952a9e017df" alt="describe" data:image/s3,"s3://crabby-images/8163f/8163f324083f004a08e6401c7d7c8d14f3def45d" alt="docker-logs" data:image/s3,"s3://crabby-images/17b64/17b64788e13cb951be2d218c3a028410c4b7305d" alt="kubectl-pod-A" data:image/s3,"s3://crabby-images/0656c/0656c4f7708cddda4bc26694a72b90c947eb6219" alt="nvidia-smi"
### What is the version? 3.3.5-3.4.1-ubi9 ### What happened? We are running dcgm-exported inside the containerd and on `g5.48xlarge` dcgm-exporter is struggling to come up online with this error ```...
NOTE: this makes two breaking changes to the 'install' target: 1. Files are now installed to "standard" locations such as /usr/local/bin/ instead of /usr/bin/. To get back to the old...
### What is the version? 3.3.5-3.4.1 ### What happened? Metrics like `DCGM_FI_PROF_GR_ENGINE_ACTIVE` are only exposed for one single pod even though there are multiple pods that use the same GPU...
### Is this a new feature, an improvement, or a change to existing functionality? Improvement ### Please provide a clear description of the problem this feature solves We need to...
Using the calculation of `sum` for the power gauge results in summing the gauge over time, rather than reflecting the current amount as a gauge should. Swap it for `lastNotNull`.
I have an A800 device, when I open the MIG model, we can't from `dcgm-exporter` get a metric know the current device MIG model is single or mixed.
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.7+incompatible to 24.0.9+incompatible. Release notes Sourced from github.com/docker/docker's releases. v24.0.9 24.0.9 For a full list of pull requests and changes in this release, refer to the relevant...
Need to be able to configure both: * https://github.com/NVIDIA/dcgm-exporter/blob/main/deployment/templates/daemonset.yaml#L109 and * https://github.com/NVIDIA/dcgm-exporter/blob/main/deployment/templates/daemonset.yaml#L114 The initial delay is too small on AKS.