dcgm-exporter
dcgm-exporter copied to clipboard
The pod for a given GPU in k8s mode cannot be captured
What happened?
Unable to collect GPU metrics for relevant pods when using passthrough mode. For example, dcgm-exporter does not collect metrics when a VM created with kubevirt mounts a GPU in passthrough mode.
kubevirt vmi yaml for mounting GPUs
spec:
domain:
devices:
...
gpus:
- deviceName: nvidia.com/GP104GL_TESLA_P4
name: gpu1
The resource of the kubevirt launcher pod that needs to be monitored.
resources:
...
requests:
...
nvidia.com/GP104GL_TESLA_P4: "1"
I have some GPU cards mounted in my cluster and from kubectl describe node I can get the following information.
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
...
nvidia.com/GP104GL_TESLA_P4 2 2
nvidia.com/GRID_P4-1Q 0 0
nvidia.com/GRID_P4-4Q 0 0
In this case, GPU cards are assigned to pods that will not be able to capture GPU metrics by EXPORT.
In the following code, we can find the rules are resourceName == nvidiaResourceName or strings.HasPrefix(resourceName, nvidiaMigResourcePrefix) The nvidiaResourceName is "nvidia.com/gpu" This filters the mounting of specific devices. https://github.com/NVIDIA/dcgm-exporter/blob/main/pkg/dcgmexporter/kubernetes.go#L142
func (p *PodMapper) toDeviceToPod(
devicePods *podresourcesapi.ListPodResourcesResponse, sysInfo SystemInfo,
) map[string]PodInfo {
deviceToPodMap := make(map[string]PodInfo)
for _, pod := range devicePods.GetPodResources() {
for _, container := range pod.GetContainers() {
for _, device := range container.GetDevices() {
resourceName := device.GetResourceName()
if resourceName != nvidiaResourceName {
// Mig resources appear differently than GPU resources
if !strings.HasPrefix(resourceName, nvidiaMigResourcePrefix) {
continue
}
}
...
}
}
}
return deviceToPodMap
}
This appears to be because the DCGM Exporter strictly follows the k8s specification for determining GPU resource. refer k8s device plugin, But it can't cover all scenarios.
The ResourceName it wants to advertise. Here ResourceName needs to follow the extended resource naming scheme as vendor-domain/resourcetype. (For example, an NVIDIA GPU is advertised as nvidia.com/gpu.)
What did you expect to happen?
GPU metrics can be collected when mounting a GPU card using kubevirt passthrough mode
What is the GPU model?
What is the environment?
pod
How did you deploy the dcgm-exporter and what is the configuration?
GPU Operator
How can we reproduce the issue?
Mounting a GPU card using kubevirt passthrough mode.
What is the version?
Latest
Anything else we need to know?
Some discussions in the kubevirt community. https://github.com/kubevirt/kubevirt/issues/11660
@rokkiter , The dcgm-exporter is dependent on https://github.com/NVIDIA/k8s-device-plugin and uses The pod-resources API to read mapping between pods and devices: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/ .
The Kubevirt support is a new environment for us. Can you give us details on setting up an environment to reproduce the issue?
Also, please explain your use case to justify the feature.
Installation environment reference https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kubevirt.htm
kubevirt configuration GPU reference https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-kubevirt.html#add-gpu-resources-to-kubevirt-cr
Using Prometheus, I was able to get monitoring information for the pods(not create by pod) in my environment that use navid.com/gpu resources, but not for the pods(created by kubevirt) that use nvidia.com/GRID_P4-1Q.
environment
- kubevirt: v1.0.0
- k8s: v1.25.6
- gpu-operator: v23.9.0
Node configuration supports pass-through mode.
- node open IOMMU. refer https://www.server-world.info/en/note?os=CentOS_7&p=kvm&f=10
- add label
gpu.workload.config=vm-passthrough
to node - update gpu-operator config
gpu-operator.sandboxWorkloads.enabled=true
gpu-operator.vfioManager.enabled=true
gpu-operator.sandboxDevicePlugin.enabled=true
gpu-operator.sandboxDevicePlugin.version=v1.2.4
gpu-operator.toolkit.version=v1.14.3-ubuntu20.04
@rokkiter , thank you for the update and provided details.
Thanks for focusing on this issue.
I recently realized that nodes configured for pass-through mode do not install dcgm-exporter. even if I manually hit the node with the nvidia.com/gpu.deploy.dcgm-exporter=true
label, this label is automatically removed!
Although it doesn't seem possible to monitor kubevirt vm GPU usage at the moment, it would be nice to have a solution to do so!
the same question. the nvidiaResourceName should not be hard code nvidia.com/gpu
, or the https://github.com/NVIDIA/k8s-device-plugin(I fix and rebuild for rename ResourceName) can advertise ResourceName like nvidia.com/a100
, it can't collect deviceToPodMap by ResourceName.
for _, pod := range devicePods.GetPodResources() {
for _, container := range pod.GetContainers() {
for _, device := range container.GetDevices() {
resourceName := device.GetResourceName()
if resourceName != nvidiaResourceName {
// Mig resources appear differently than GPU resources
if !strings.HasPrefix(resourceName, nvidiaMigResourcePrefix) {
continue
}
}
podInfo := PodInfo{
Name: pod.GetName(),
Namespace: pod.GetNamespace(),
Container: container.GetName(),
}
@lx1036 , Thank you for your finding. We are accepting PRs;)
@nvvfedorov already make the PR https://github.com/NVIDIA/dcgm-exporter/pull/359. thanks.