k8s-device-plugin icon indicating copy to clipboard operation
k8s-device-plugin copied to clipboard

helm: can't upgrade to 0.15.0 in place due to daemonset label selector change

Open mrparkers opened this issue 2 months ago • 3 comments

The newest version of the k8s-device-plugin chart seems to have removed support for specifying label selectors for each daemonset. Because these label selectors are now impossible to change (and this field is immutable and thus cannot be changed via the k8s API), this makes an in-place upgrade to v0.15.0 via helm very difficult.

You can use helm template along with yq to observe this change. If you have both of these tools installed, use this one-liner to observe the label selectors for v0.14.5:

helm template nvidia-device-plugin nvdp/nvidia-device-plugin --version 0.14.5 --set gfd.enabled=true | yq e 'select(.kind == "DaemonSet") | select(.metadata.name == "nvidia-device-plugin-gpu-feature-discovery") | .spec.selector.matchLabels'

This results in these label selectors:

app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/instance: nvidia-device-plugin

These are the default label selectors for GFD, but they can be changed via gfd.nameOverride in the values.

However, in v0.15.0, the default label selectors have changed, and there is no way to use helm values to change them back to what they were before:

helm template nvidia-device-plugin nvdp/nvidia-device-plugin --version 0.15.0 --set gfd.enabled=true | yq e 'select(.kind == "DaemonSet") | select(.metadata.name == "nvidia-device-plugin-gpu-feature-discovery") | .spec.selector.matchLabels'

This results in these label selectors:

app.kubernetes.io/name: nvidia-device-plugin
app.kubernetes.io/instance: nvidia-device-plugin

Because these label selectors cannot be changed in v0.15.0 by any helm value, any attempt at an upgrade results in an error that looks like this:

Helm upgrade failed for release kube-system/nvidia-device-plugin with chart [email protected]: cannot patch "nvidia-device-plugin-gpu-feature-discovery" with kind DaemonSet: DaemonSet.apps "nvidia-device-plugin-gpu-feature-discovery" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"nvidia-device-plugin", "app.kubernetes.io/name":"nvidia-device-plugin"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

Is this a bug in v0.15.0 of the chart, or am I missing some other way to change these label selectors?

mrparkers avatar May 08 '24 20:05 mrparkers