Allow MPS control daemonset to be explicitly disabled
This change allows the deployment of the MPS daemon to be explicitly disabled by adding --set mps.enabled=false to a Helm install / upgrade command.
The default behaviour of the plugin is to deploy the MPS control daemonset even if not MPS sharing is configured. However, the actual MPS daemon is only started if a GPU is replicated using MPS. The mps.enabled Helm value allows this to be explicitly disabled.
Fixes #1177 with backward compatibility so this should be a minor change.
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
Unsure if we need also to change NVIDIA_DRIVER_CAPABILITIES env var value when running without MPS.
@chipzoller can you re-review this?
This PR is stale because it has been open 90 days with no activity. This PR will be closed in 30 days unless new comments are made or the stale label is removed.
@elezar if a user is upgrading device plugin via helm chart upgrade, the default value of true will be applied so no undesired changes are expected.
Also, if a user is upgrading device plugin via image value, then the helm chart change i'm proposing isn't taken into consideration so again no undesired changes are expected.
Sorry for the delay in getting to this.
Since we are slowly working on adding MPS support to the k8s DRA driver for GPUs, we may want to consider making the MPS support in the device plugin explicitly OPT-in. We have not had the time to further flesh out this feature, and making it opt-in in addition to requiring an explicit sharing config with MPS replicas would help to set expectations accordingly.
@klueska what are your thoughts on this?