dcgm-exporter issues

Results 37 dcgm-exporter issues

Sort by recently updated

Fix grafana dashboard cannot display properly in vGPU cluster

Fix https://github.com/NVIDIA/dcgm-exporter/issues/236 ### Test Create a k8s cluster with vGPU configured on one node. ``` kc get node hw-sks-test-vgpu-vgpunode-8jwnn -oyaml | yq '.status.allocatable' cpu: "8" ephemeral-storage: "57976119610" hugepages-2Mi: "0" memory:...

Levi080513

NVIDIA DCGM Exporter Dashboard does not work in vGPU cluster

Currently we use `DCGM_FI_DEV_GPU_TEMP` to obtain the instance/GPU list, but this metrics is not collected in vGPU clusters. This will prevent the dashboard from displaying properly. https://github.com/NVIDIA/dcgm-exporter/blob/30d4ddcae9c7153c31dd35301aa4a1f3b90a2096/grafana/dcgm-exporter-dashboard.json#L784 https://github.com/NVIDIA/dcgm-exporter/blob/30d4ddcae9c7153c31dd35301aa4a1f3b90a2096/grafana/dcgm-exporter-dashboard.json#L761

Levi080513

Support for reporting FP8 and Transformer Engine usage on H100 GPU's (repost from DCGM Github)

Hi @glowkey, [I posted the following issue in dcgm github](https://github.com/NVIDIA/DCGM/issues/86) repo to really get it into dcgm-exporter, but maybe I should have done something here> I've posted the text below,...

hassanbabaie

Possible to get MIG UUID as a label in Prometheus

Currently the in the metrics it will give the UUID of the GPU as a label and not UUID of the MIG partition. Is there a way to get the...

avickars

Metric about compute apps

Do we have any metrics / Is it worthy to add a metric about the GPU allocated compute process, just like the following output of nvidia-smi: ``` > nvidia-smi --query-compute-apps=gpu_uuid,name...

onstring

enhancement

question

Zero values for MIG instances using dcgm-exporter.

Hi, **DCGM Version: 2.2.9** **CUDA: 11.4** **Driver: datacenter-gpu-manager-2.2.9-1.x86_64** We have recently purchased a Dell R750xa with x4 A100-40GB GPUs. I built the dcgm-exporter binary from source and when running can...

Shadowphax

how to query rated power?

### Is this a new feature, an improvement, or a change to existing functionality? New Feature ### Please provide a clear description of the problem this feature solves How do...

wade-liwei

enhancement

dcgm-exporter
dcgm-exporter copied to clipboard

Metadata

Fix grafana dashboard cannot display properly in vGPU cluster

NVIDIA DCGM Exporter Dashboard does not work in vGPU cluster

Support for reporting FP8 and Transformer Engine usage on H100 GPU's (repost from DCGM Github)