terraform-provider-kops
terraform-provider-kops copied to clipboard
GPU instance groups apply loop
Hi 👋
We've upgraded from kops 1.23 to 1.26 (provider 1.26.0-rc1
). The upgrade was successful after some trial and error. Now, when we run apply again, the updater is always triggered:
# kops_instance_group.workers["ondemand-amd-32GiB-8vCPU-1GPU-eu-central-1c"] will be updated in-place
~ resource "kops_instance_group" "workers" {
id = "domain.com/ondemand-amd-32GiB-8vCPU-1GPU-eu-central-1c"
name = "ondemand-amd-32GiB-8vCPU-1GPU-eu-central-1c"
~ node_labels = {
- "kops.k8s.io/gpu" = "1" -> null
}
~ revision = 7 -> 8
~ taints = [
"arch=amd64:PreferNoSchedule",
- "nvidia.com/gpu:NoSchedule",
]
# (29 unchanged attributes hidden)
# (4 unchanged blocks hidden)
}
The corresponding RuntimeClass looks OK: https://github.com/kubernetes/kops/blob/v1.26.4/upup/models/cloudup/resources/addons/nvidia.addons.k8s.io/k8s-1.16.yaml.template#L44-L59
But somehow the node labels and taints are not in sync between kops and the cluster anymore 🤔