machine-controller
machine-controller copied to clipboard
[vSphere] Machine Cleanup fails due to inability to remove tags
Since #1213 was merged our machine-controller deployment is unable finish machine cleanup and keeps unused and drained machines running indefinitely.
The machine cleanup fails with failed to delete machine at cloud provider, due to failed to delete tags. Indeed machine-controller tries to delete some tags that it is not supposed to touch (because they are created and used by our external monitoring system) and fails because it lacks the permissions to remove them.
The assumption seems to be that all tags on the machine were created by machine-controller which I think might not be true in most vSphere deployments since a lot of monitoring/backup/compliance systems are using tags to keep track of vSphere objects.
To fix this this I think this code here should probably check which of the attached tags are actually in the configuration object and only delete those with matching names in the config struct. Instead of just trying to delete all tags attached to the machine. https://github.com/kubermatic/machine-controller/blob/19af243aeb7fbacfb2706c5d5b63b59de9796f50/pkg/cloudprovider/provider/vsphere/helper.go#L499-L509