dcgm-exporter icon indicating copy to clipboard operation
dcgm-exporter copied to clipboard

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

Results 37 dcgm-exporter issues
Sort by recently updated
recently updated
newest added

Fix https://github.com/NVIDIA/dcgm-exporter/issues/236 ### Test Create a k8s cluster with vGPU configured on one node. ``` kc get node hw-sks-test-vgpu-vgpunode-8jwnn -oyaml | yq '.status.allocatable' cpu: "8" ephemeral-storage: "57976119610" hugepages-2Mi: "0" memory:...

Currently we use `DCGM_FI_DEV_GPU_TEMP` to obtain the instance/GPU list, but this metrics is not collected in vGPU clusters. This will prevent the dashboard from displaying properly. https://github.com/NVIDIA/dcgm-exporter/blob/30d4ddcae9c7153c31dd35301aa4a1f3b90a2096/grafana/dcgm-exporter-dashboard.json#L784 https://github.com/NVIDIA/dcgm-exporter/blob/30d4ddcae9c7153c31dd35301aa4a1f3b90a2096/grafana/dcgm-exporter-dashboard.json#L761

Hi @glowkey, [I posted the following issue in dcgm github](https://github.com/NVIDIA/DCGM/issues/86) repo to really get it into dcgm-exporter, but maybe I should have done something here> I've posted the text below,...

Currently the in the metrics it will give the UUID of the GPU as a label and not UUID of the MIG partition. Is there a way to get the...

Do we have any metrics / Is it worthy to add a metric about the GPU allocated compute process, just like the following output of nvidia-smi: ``` > nvidia-smi --query-compute-apps=gpu_uuid,name...

enhancement
question

Hi, **DCGM Version: 2.2.9** **CUDA: 11.4** **Driver: datacenter-gpu-manager-2.2.9-1.x86_64** We have recently purchased a Dell R750xa with x4 A100-40GB GPUs. I built the dcgm-exporter binary from source and when running can...

### Is this a new feature, an improvement, or a change to existing functionality? New Feature ### Please provide a clear description of the problem this feature solves How do...

enhancement