k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

Allow MPS control daemonset to be explicitly disabled

Open marinoborges opened this issue 10 months ago • 6 comments

This change allows the deployment of the MPS daemon to be explicitly disabled by adding --set mps.enabled=false to a Helm install / upgrade command.

The default behaviour of the plugin is to deploy the MPS control daemonset even if not MPS sharing is configured. However, the actual MPS daemon is only started if a GPU is replicated using MPS. The mps.enabled Helm value allows this to be explicitly disabled.

Fixes #1177 with backward compatibility so this should be a minor change.

marinoborges avatar Feb 25 '25 15:02 marinoborges

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Feb 25 '25 15:02 copy-pr-bot[bot]

Unsure if we need also to change NVIDIA_DRIVER_CAPABILITIES env var value when running without MPS.

marinoborges avatar Feb 27 '25 16:02 marinoborges

@chipzoller can you re-review this?

sidewinder12s avatar Jun 18 '25 17:06 sidewinder12s

This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Sep 17 '25 04:09 github-actions[bot]

@elezar if a user is upgrading device plugin via helm chart upgrade, the default value of true will be applied so no undesired changes are expected. Also, if a user is upgrading device plugin via image value, then the helm chart change i'm proposing isn't taken into consideration so again no undesired changes are expected.

marinoborges avatar Sep 18 '25 16:09 marinoborges

Sorry for the delay in getting to this.

Since we are slowly working on adding MPS support to the k8s DRA driver for GPUs, we may want to consider making the MPS support in the device plugin explicitly OPT-in. We have not had the time to further flesh out this feature, and making it opt-in in addition to requiring an explicit sharing config with MPS replicas would help to set expectations accordingly.

@klueska what are your thoughts on this?

elezar avatar Dec 02 '25 14:12 elezar