ipset cali40all-hosts-net cannot be destroyed can calico-kube-controllers Failed to initialize Calico datastore
I reinstalled my Kubernetes cluster and the Calico version upgraded from v3.20.3 to v3.25.2.
Expected Behavior
ipset can be destroyed and run calico node.
Current Behavior
cali40all-hosts-net cannot be destroyed and unable to run calico.
- calico node log
2023-11-07 10:03:05.205 [INFO][93] felix/ipsets.go 616: Resync found left-over Calico IP set. Queueing deletion. family="inet" setName="cali40all-hosts-net"
2023-11-07 10:03:05.206 [INFO][93] felix/ipsets.go 883: Deleting IP set. family="inet" setName="cali40all-hosts-net"
2023-11-07 10:03:05.206 [INFO][93] felix/ipsets.go 921: Deleting IP set. family="inet" setName="cali40all-hosts-net"
2023-11-07 10:03:05.287 [WARNING][93] felix/ipsets.go 927: Failed to delete IP set, may be out-of-sync. error=exit status 1 family="inet" output="ipset v7.11: Set cannot be destroyed: it is in use by a kernel component\n" setName="cali40all-hosts-net"
- cali40all-hosts-net
$ ipset list cali40all-hosts-net
Name: cali40all-hosts-net
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 504
References: 2
Number of entries: 1
Members:
192.168.56.198
- can not destroy cali40all-hosts-net
$ ipset destroy cali40all-hosts-net
ipset v7.15: Set cannot be destroyed: it is in use by a kernel component
- no rule match cali40all-hosts-net in iptables.
$ iptables -nvL | grep cali40all-hosts-net
Steps to Reproduce (for bugs)
- Deploy Kubernetes using Kubespray v2.18.1. This installation includes Kubernetes v1.22.8 and Calico v3.20.3.
- Run reset.yml in kubespray to clean cluster.
- Deploy Kubernetes using Kubespray v2.23.0. This installation includes Kubernetes v1.27.5 and Calico v3.25.2.
Your Environment
- Calico version
v3.20.3 to v3.25.2
- Orchestrator version (e.g. kubernetes, mesos, rkt):
kubernetes: v1.22.8 to v1.27.5
kubespray: v2.18.1 to v2.23.0
- Operating System and version:
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
FYI iptables -L doesn't show you all the rules: it only shows you the filter table. Better to use iptables-save to review everything.
That warning message by itself shouldn't be too much of a problem though: are you switching from IPIP to something else (intentionally)? That ipset is used in IPIP mode only.
There's probably another cause for any installation problems you have (switching from IPIP to something that doesn't in work in your environment could be that), but more info would be needed.
It fixed after a REBOOT, ipset is destroyed, and calico-kube-controllers can start. I can reproduce it on my second node.
Before reboot.
- calico-kube-controllers is CrashLoopBackOff
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-5b5bfd6db7-pt2wr 0/1 CrashLoopBackOff 7 (2m15s ago) 14m
2023-11-09 02:24:18.167 [INFO][1] main.go 107: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W1109 02:24:18.167890 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2023-11-09 02:24:18.168 [INFO][1] main.go 131: Ensuring Calico datastore is initialized
2023-11-09 02:24:48.169 [ERROR][1] client.go 290: Error getting cluster information config ClusterInformation="default" error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:24:48.169 [INFO][1] main.go 138: Failed to initialize datastore error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:25:18.191 [ERROR][1] client.go 290: Error getting cluster information config ClusterInformation="default" error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:25:18.191 [INFO][1] main.go 138: Failed to initialize datastore error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:25:18.191 [FATAL][1] main.go 151: Failed to initialize Calico datastore
- try curl on node
$ curl -k https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "clusterinformations.crd.projectcalico.org \"default\" is forbidden: User \"system:anonymous\" cannot get resource \"clusterinformations\" in API group \"crd.projectcalico.org\" at the cluster scope",
"reason": "Forbidden",
"details": {
"name": "default",
"group": "crd.projectcalico.org",
"kind": "clusterinformations"
},
"code": 403
}
- ipset
$ iptables-save | grep cali40
-A cali-INPUT -p udp -m comment --comment "cali:J76FwxInZIsk7uKY" -m comment --comment "Allow IPv4 VXLAN packets from allowed hosts" -m multiport --dports 4789 -m set --match-set cali40all-vxlan-net src -m addrtype --dst-type LOCAL -j ACCEPT
-A cali-OUTPUT -p udp -m comment --comment "cali:ClE20y3NCwgoEuMI" -m comment --comment "Allow IPv4 VXLAN packets to other allowed hosts" -m multiport --dports 4789 -m addrtype --src-type LOCAL -m set --match-set cali40all-vxlan-net dst -j ACCEPT
-A cali-forward-check -p tcp -m comment --comment "cali:ZD-6UxuUtGW-xtzg" -m comment --comment "To kubernetes NodePort service" -m multiport --dports 30000:32767 -m set --match-set cali40this-host dst -g cali-set-endpoint-mark
-A cali-forward-check -p udp -m comment --comment "cali:CbPfUajQ2bFVnDq4" -m comment --comment "To kubernetes NodePort service" -m multiport --dports 30000:32767 -m set --match-set cali40this-host dst -g cali-set-endpoint-mark
-A cali-forward-check -m comment --comment "cali:jmhU0ODogX-Zfe5g" -m comment --comment "To kubernetes service" -m set ! --match-set cali40this-host dst -j cali-set-endpoint-mark
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
-A cali-nat-outgoing -m comment --comment "cali:flqWnvo8yq4ULQLa" -m set --match-set cali40masq-ipam-pools src -m set ! --match-set cali40all-ipam-pools dst -j MASQUERADE --random-fully
@yckaolalala Interesting that the ipset doesn't appear in iptables at the point you recorded it. Could you please provide the full Felix log?
@matthewdupre If any other logs need to be provided?
- calico-node.yaml
- name: FELIX_LOGSEVERITYSCREEN
value: "debug"
# Set Calico startup logging to "error"
- name: CALICO_STARTUP_LOGLEVEL
value: "debug"
- pod status
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-794577df96-tklxp 0/1 CrashLoopBackOff 10 (65s ago) 18m 10.233.107.129 k8s-192-168-56-198 <none> <none>
calico-node-sscjq 1/1 Running 0 3m16s 192.168.56.198 k8s-192-168-56-198 <none> <none>
kube-apiserver-k8s-192-168-56-198 1/1 Running 1 19m 192.168.56.198 k8s-192-168-56-198 <none> <none>
kube-controller-manager-k8s-192-168-56-198 1/1 Running 2 20m 192.168.56.198 k8s-192-168-56-198 <none> <none>
kube-proxy-9n5ps 1/1 Running 0 19m 192.168.56.198 k8s-192-168-56-198 <none> <none>
kube-scheduler-k8s-192-168-56-198 1/1 Running 2 (16m ago) 20m 192.168.56.198 k8s-192-168-56-198 <none> <none>
-
/var/log/calico/cni/cni.log cni.log
-
calico-node calico-node-debug.log
-
calico-kube-controller calico-kube-controller.log
-
iptables-save iptables-save.log
-
ip -a ipa.log
-
ipset list ipset.log
I think problem the is that pod cannot connect to kube-apiserver even though calico-node is running. However, I can successfully connect to kube-apiserver on localhost using both the node IP and the default Kubernetes service IP.
curl -k https://10.233.0.1:443/version
curl -k https://192.168.56.198:6443/version
{
"major": "1",
"minor": "27",
"gitVersion": "v1.27.5",
"gitCommit": "93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c",
"gitTreeState": "clean",
"buildDate": "2023-08-24T00:42:11Z",
"goVersion": "go1.20.7",
"compiler": "gc",
"platform": "linux/amd64"
}
I attempted to change the Calico datastore from kdd to etcd. After starting calico-node, the original issue persists, but the ipset cali40all-hosts-net is removed.
calico-kube-controllers can start normally because it no longer needs to connect to kube-apiserver, and other pods still cannot connect to kube-apiserver. As a result, my CoreDNS fails to start due to this."
Strangely, all these problems were resolved after a reboot.
I also attempted to remove Calico and change the network plugin to Flannel, and my CoreDNS can start without a reboot.
i have meet the same problem
Warning: iptables-legacy tables present, use iptables-legacy-save to see them
This message is a bit suspicious - is there a process running on this node that is using legacy iptables?
I am closing this issue du to lack of activity and due to the fact that calico moved up to 3.28. Feel free to reopen if there is any new info.