dcgm-exporter icon indicating copy to clipboard operation
dcgm-exporter copied to clipboard

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

Results 77 dcgm-exporter issues
Sort by recently updated
recently updated
newest added

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.20.0 to 0.23.0. Commits c48da13 http2: fix TestServerContinuationFlood flakes 762b58d http2: fix tipos in comment ba87210 http2: close connections when receiving too many headers ebc8168 all: fix...

dependencies

Hello all , docker image :nvcr.io/nvidia/k8s/dcgm-exporter -3.2.6-3.1.9-ubi8 OS : Centos 8.4 cuda-12.2 Driver-535.104.05 ![describe](https://github.com/NVIDIA/dcgm-exporter/assets/119838086/d8cc6316-7609-4c2a-bb5a-2894b6b4875f) ![docker-logs](https://github.com/NVIDIA/dcgm-exporter/assets/119838086/b1283c97-27b9-4060-909e-7d651da81e1d) ![kubectl-pod-A](https://github.com/NVIDIA/dcgm-exporter/assets/119838086/387623ca-e35b-416c-a2c1-d1d946fda080) ![nvidia-smi](https://github.com/NVIDIA/dcgm-exporter/assets/119838086/8360fcf7-d598-4ea7-bdfc-d8c668d738ce)

### What is the version? 3.3.5-3.4.1-ubi9 ### What happened? We are running dcgm-exported inside the containerd and on `g5.48xlarge` dcgm-exporter is struggling to come up online with this error ```...

bug

NOTE: this makes two breaking changes to the 'install' target: 1. Files are now installed to "standard" locations such as /usr/local/bin/ instead of /usr/bin/. To get back to the old...

duplicate
stale

### What is the version? 3.3.5-3.4.1 ### What happened? Metrics like `DCGM_FI_PROF_GR_ENGINE_ACTIVE` are only exposed for one single pod even though there are multiple pods that use the same GPU...

bug
time-slicing

### Is this a new feature, an improvement, or a change to existing functionality? Improvement ### Please provide a clear description of the problem this feature solves We need to...

enhancement

Using the calculation of `sum` for the power gauge results in summing the gauge over time, rather than reflecting the current amount as a gauge should. Swap it for `lastNotNull`.

I have an A800 device, when I open the MIG model, we can't from `dcgm-exporter` get a metric know the current device MIG model is single or mixed.

question
action_required_from_requester

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 24.0.7+incompatible to 24.0.9+incompatible. Release notes Sourced from github.com/docker/docker's releases. v24.0.9 24.0.9 For a full list of pull requests and changes in this release, refer to the relevant...

Need to be able to configure both: * https://github.com/NVIDIA/dcgm-exporter/blob/main/deployment/templates/daemonset.yaml#L109 and * https://github.com/NVIDIA/dcgm-exporter/blob/main/deployment/templates/daemonset.yaml#L114 The initial delay is too small on AKS.

enhancement