oci-cloud-controller-manager
oci-cloud-controller-manager copied to clipboard
Deprecated label(failure-domain.beta.kubernetes.io/zone) blocks csi-oci-node-driver startup
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
CCM Version: v1.27.2
Environment:
-
Kubernetes version (use
kubectl version
): v1.27.10 - OS (e.g. from /etc/os-release): ubuntu 22.04 LTS
-
Kernel (e.g.
uname -a
):Linux master-node1 5.15.0-97-generic #107-Ubuntu SMP Wed Feb 7 13:26:48 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux - Others:
What happened?
I build k8s cluster using kubespray myself using the vms on OCI, I want to use FSS as the persistent volume in my cluster, so I am going to install OCI csi plugin according to this document: https://github.com/oracle/oci-cloud-controller-manager/blob/master/container-storage-interface.md
But I encounter one error when I install the oci-csi-node-driver, the pod can't start up, and get this error:
I0327 07:36:39.288668 1 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/blockvolume.csi.oraclecloud.com/registration"
I0327 07:36:39.311883 1 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = Failed to get availability domain of node from kube api server.,}
E0327 07:36:39.311913 1 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Internal desc = Failed to get availability domain of node from kube api server., restarting registration container.
After checking the source code, I realize that the node missing a label: failure-domain.beta.kubernetes.io/zone, below is the code:
func (u *Util) LookupNodeAvailableDomain(k kubernetes.Interface, nodeID string) (string, error) {
n, err := k.CoreV1().Nodes().Get(context.Background(), nodeID, metav1.GetOptions{})
if err != nil {
u.Logger.With(zap.Error(err)).With("nodeId", nodeID).Error("Failed to get Node by name.")
return "", fmt.Errorf("failed to get node %s", nodeID)
}
if n.Labels != nil {
ad, ok := n.Labels[kubeAPI.LabelZoneFailureDomain]
if ok {
return ad, nil
}
}
errMsg := fmt.Sprint("Did not find the label for the fault domain.")
u.Logger.With("nodeId", nodeID, "label", kubeAPI.LabelZoneFailureDomain).Error(errMsg)
return "", fmt.Errorf(errMsg)
}
But I checked the documentation of kubernetes, It says this label is deprecated: https://kubernetes.io/docs/reference/labels-annotations-taints/#failure-domainbetakubernetesiozone
What you expected to happen?
- Start up successfully without having this label on the Node.
- Could anyone help to explain what this label is used for in OCI csi plugin?
How to reproduce it (as minimally and precisely as possible)?
Bootstrap k8s cluster your self and then install OCI csi plugin.
Anything else we need to know?
No.
For backward compatibility of older clusters, the failure domain label is still being used. Suggest you to add the label manually to nodes until we get the fix for newer topology label. Also, we use cluster-api to create self managed OCI clusters - https://oracle.github.io/cluster-api-provider-oci/ to test the releases. You might want to try this so it adds required labels that work with CCM/CSI
@mrunalpagnis
Thanks for your reply.
Where can I get all the labels which should be added to the nodes? Because we are not using cluster api, so maybe I need add all these labels manually.
For now, you can add failure-domain.beta.kubernetes.io/zone
label to your nodes. The value would be the availability-domain of your nodes. The use of these deprecated labels will soon be discontinued and we will switch to using topology.kubernetes.io/zone