kubeadm
kubeadm copied to clipboard
abnormal removal of etcd member during reset when has multiple network interface
What keywords did you search in kubeadm issues before filing this one?
In the case of multiple network cards, abnormal removal of etcd member during reset.
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use kubeadm version
):
Environment:
-
Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration:
- OS (e.g. from /etc/os-release):
-
Kernel (e.g.
uname -a
): - Container runtime (CRI) (e.g. containerd, cri-o):
- Container networking plugin (CNI) (e.g. Calico, Cilium):
- Others:
What happened?
In the case of multiple network cards, abnormal removal of etcd member during reset.
What you expected to happen?
How to reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
In the case of multiple network cards, abnormal removal of etcd member during reset.
can you explain what happens exactly? generally we don't want to add APIEndpoint or other init/join control plane options to ResetConfiguration.
also please see this section of the docs: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#network-setup
k8s doesn't support well multiple network interfaces.
@neolit123 thanks. let me explain. https://github.com/yxxhero/kubernetes/blob/cd5e65709e4874af87305714c7866ab59c8f9c3a/cmd/kubeadm/app/phases/etcd/local.go#L116-L127
see the above code
etcdPeerAddress := etcdutil.GetPeerURL(&cfg.LocalAPIEndpoint)
klog.V(2).Infof("[etcd] get the member id from peer: %s", etcdPeerAddress)
id, err := etcdClient.GetMemberID(etcdPeerAddress)
if err != nil {
if errors.Is(etcdutil.ErrNoMemberIDForPeerURL, err) {
klog.V(5).Infof("[etcd] member was already removed, because no member id exists for peer %s", etcdPeerAddress)
return nil
}
return err
}
When we add a node using the kubeadm join --control-plane --apiserver-advertise-address <some_ip> command, if we reset the node with kubeadm reset, it will not be able to remove the etcd member because it uses the default IP address, which is different from the one specified in --apiserver-advertise-address. As a result, manual removal of the etcd member is required.
Or maybe we can get the right ip from the etcd.yaml? @neolit123
When we add a node using the kubeadm join --control-plane --apiserver-advertise-address <some_ip> command, if we reset the node with kubeadm reset, it will not be able to remove the etcd member because it uses the default IP address, which is different from the one specified in --apiserver-advertise-address. As a result, manual removal of the etcd member is required.
but kubeadm annotates the etcd and apiserver pods: https://kubernetes.io/docs/reference/labels-annotations-taints/#kubeadm-kubernetes-io-kube-apiserver-advertise-address-endpoint
etcdPeerAddress := etcdutil.GetPeerURL(&cfg.LocalAPIEndpoint)
IIRC, this cfg here is constructed with data from the node. it's an initconfiguration, but during reset it get's data from various places and the localAPIEndpoint should be the one from the kube-apiserver pod.
so, instead of adding the field in resetconfiguration we need to investigate what's not working in the existing code, IMO.
@neolit123 yeah. I will do as you advice.
@neolit123 yeah. I will do as you advice.
there might be a bug in our code, let's take our time to understand the problem better. because this etcd management is a sensitive area of kubeadm.
EDIT: also please show example IPs, what is expected, what is in the annotations and what the code gives you. perhaps you want to add some fmt.Printf... in the logic.
@neolit123 please review my new idea.
can we close this issue and https://github.com/kubernetes/kubernetes/pull/123110 ?
sure