k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

allPossibleMigStrategiesAreNone is false when using default values

Open jakubkrzykowski opened this issue 1 year ago • 6 comments

1. Quick Debug Information

N/A

2. Issue or feature description

When trying to deploy the latest (also tested on v0.13.0) version of the helm chart with the default values file I can see that NVIDIA_MIG_MONITOR_DEVICES env var are being set to all and securityContext.capabilities.add are being set to SYS_ADMIN. allPossibleMigStrategiesAreNone checks if migStrategy is set to "none" but doesn't cover the scenario when this value is null (not set at all), which is the default behavior of the helm chart.

Is this intentional?

jakubkrzykowski avatar Apr 22 '24 13:04 jakubkrzykowski

@jakubkrzykowski this sounds like a bug. Looking at the helm helper, we definitely don't check for the null value.

Would you be able to submit a PR with a fix?

As a workaround, please set the value to none for the time being.

elezar avatar Apr 22 '24 14:04 elezar

Hi @elezar ! I was able to workaround it by setting it explicitly to none as you suggested. Just wanted to be sure if this is not intentional. Happy to raise a PR with a fix.

jakubkrzykowski avatar Apr 22 '24 15:04 jakubkrzykowski

I've checked the _helpers.tpl file thoroughly and found that the problem lies with {{- else if ne (include "nvidia-device-plugin.configMapName" .) "true" -}} in the allPossibleMigStrategiesAreNone definition. With the default values, it will never be true because configMapName will always be an empty string, not the string "true", according to this:

{{- define "nvidia-device-plugin.configMapName" -}}
{{- $result := "" -}}
{{- if .Values.config.name -}}
  {{- $result = .Values.config.name -}}
{{- else if not (empty .Values.config.map) -}}
  {{- $result = printf "%s-%s" (include "nvidia-device-plugin.fullname" .) "configs" -}}
{{- end -}}
{{- $result -}}
{{- end -}}

jakubkrzykowski avatar Apr 22 '24 20:04 jakubkrzykowski

@elezar here is the PR to address that - https://github.com/NVIDIA/k8s-device-plugin/pull/675

jakubkrzykowski avatar Apr 22 '24 22:04 jakubkrzykowski

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Jul 22 '24 04:07 github-actions[bot]

looks like this introduced a bug https://github.com/NVIDIA/k8s-device-plugin/issues/856 where SYS_ADMIN cap gets dropped after upgrading to the helm-chart after https://github.com/NVIDIA/k8s-device-plugin/commit/4ca56ae17533724910f6e246f08dd0db7b3c37cd was added. Quick workaround until fixed is simply passing --set securityContext.capabilities.add[0]=SYS_ADMIN to the helm chart.

andy108369 avatar Aug 01 '24 16:08 andy108369