k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Bug: Incorrect set_default_active_thread_percentage Behavior in Kubernetes Device Plugin with MPS

Open Atoms opened this issue 2 months ago • 1 comments

Summary

When using NVIDIA MPS with the Kubernetes device plugin, the set_default_active_thread_percentage value is being set incorrectly, leading to severe GPU underutilization on workloads scheduled to the same GPU.

This parameter is global per MPS daemon, and since there is one MPS daemon per GPU, incorrectly setting it (e.g. based on replica count) results in throttling all associated workloads to a fraction of the GPU capacity.

Observed Behavior

  • When running with the default active_thread_percentage applied by the device plugin, nvidia-smi or any other gpu monitoring tool shows GPU usage around 60% with 2 workloads. (adding more workloads GPU usage increases to 100% and applications start to slow down)
  • When applying set_active_thread_percentage 100 manually via nvidia-cuda-mps-control, the same workload drops to ~2–3% GPU usage, showing that resources are correctly shared and not artificially limited.
  • This confirms that the device plugin is configuring MPS with the wrong active_thread_percentage during initialization.

How to Reproduce

  • Deploy a GPU workload using the Kubernetes NVIDIA device plugin with MPS enabled.
  • Observe GPU utilization in DCGM/NVIDIA SMI
  • Exec into the workload pod and run:
echo "get_server_list" | nvidia-cuda-mps-control
SERVERID
echo "set_active_thread_percentage $SERVERID 100" | nvidia-cuda-mps-control
  • Restart workload to apply new values.
  • Observe that workload GPU usage is decreasing, and therefore can scale up. Application is not slowing down.

Root Cause

set_default_active_thread_percentage is applied based on replica count, but MPS only runs a single daemon per GPU, so this setting is shared across all workloads.

Expected Behavior

The device plugin should not override active_thread_percentage unless explicitly configured by the user.

Per-GPU or per-pod resource tuning should not be attempted in this manner without awareness of global MPS constraints.

Environment Details GPU: e.g. NVIDIA RTX 4000 Driver version: 580.82.07 Container image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4 Kubernetes version: 1.32.6

Atoms avatar Nov 07 '25 07:11 Atoms

I see it will be possible to override this in DRA https://github.com/NVIDIA/k8s-dra-driver-gpu/blob/main/templates/mps-control-daemon.tmpl.yaml#L33

Atoms avatar Nov 07 '25 08:11 Atoms