allPossibleMigStrategiesAreNone is false when using default values
1. Quick Debug Information
N/A
2. Issue or feature description
When trying to deploy the latest (also tested on v0.13.0) version of the helm chart with the default values file I can see that NVIDIA_MIG_MONITOR_DEVICES env var are being set to all and securityContext.capabilities.add are being set to SYS_ADMIN.
allPossibleMigStrategiesAreNone checks if migStrategy is set to "none" but doesn't cover the scenario when this value is null (not set at all), which is the default behavior of the helm chart.
Is this intentional?
@jakubkrzykowski this sounds like a bug. Looking at the helm helper, we definitely don't check for the null value.
Would you be able to submit a PR with a fix?
As a workaround, please set the value to none for the time being.
Hi @elezar !
I was able to workaround it by setting it explicitly to none as you suggested. Just wanted to be sure if this is not intentional.
Happy to raise a PR with a fix.
I've checked the _helpers.tpl file thoroughly and found that the problem lies with {{- else if ne (include "nvidia-device-plugin.configMapName" .) "true" -}} in the allPossibleMigStrategiesAreNone definition. With the default values, it will never be true because configMapName will always be an empty string, not the string "true", according to this:
{{- define "nvidia-device-plugin.configMapName" -}}
{{- $result := "" -}}
{{- if .Values.config.name -}}
{{- $result = .Values.config.name -}}
{{- else if not (empty .Values.config.map) -}}
{{- $result = printf "%s-%s" (include "nvidia-device-plugin.fullname" .) "configs" -}}
{{- end -}}
{{- $result -}}
{{- end -}}
@elezar here is the PR to address that - https://github.com/NVIDIA/k8s-device-plugin/pull/675
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
looks like this introduced a bug https://github.com/NVIDIA/k8s-device-plugin/issues/856 where SYS_ADMIN cap gets dropped after upgrading to the helm-chart after https://github.com/NVIDIA/k8s-device-plugin/commit/4ca56ae17533724910f6e246f08dd0db7b3c37cd was added.
Quick workaround until fixed is simply passing --set securityContext.capabilities.add[0]=SYS_ADMIN to the helm chart.