dcgm-exporter icon indicating copy to clipboard operation
dcgm-exporter copied to clipboard

Fix grafana dashboard cannot display properly in vGPU cluster

Open Levi080513 opened this issue 7 months ago • 0 comments

Fix https://github.com/NVIDIA/dcgm-exporter/issues/236

Test

Create a k8s cluster with vGPU configured on one node.

kc get node hw-sks-test-vgpu-vgpunode-8jwnn -oyaml | yq '.status.allocatable'
cpu: "8"
ephemeral-storage: "57976119610"
hugepages-2Mi: "0"
memory: 15968092Ki
nvidia.com/gpu: "1"
pods: "110"

kc exec -ti nvidia-driver-daemonset-4.18.0-477.27.1.el8.8-rocky8.8-x7qj8 -n sks-system-nvidia-gpu -- nvidia-smi
Mon Jan 29 10:20:21 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID V100-4C        On   | 00000000:00:0A.0 Off |                    0 |
| N/A   N/A    P0    N/A /  N/A |     12MiB /  4096MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    243152      C   /app/gpu_burn                      12MiB |
+-----------------------------------------------------------------------------+

Before fixing image

After fixing image

image

Levi080513 avatar Jan 29 '24 02:01 Levi080513