calico icon indicating copy to clipboard operation
calico copied to clipboard

ipset cali40all-hosts-net cannot be destroyed can calico-kube-controllers Failed to initialize Calico datastore

Open yckaolalala opened this issue 2 years ago • 7 comments

I reinstalled my Kubernetes cluster and the Calico version upgraded from v3.20.3 to v3.25.2.

Expected Behavior

ipset can be destroyed and run calico node.

Current Behavior

cali40all-hosts-net cannot be destroyed and unable to run calico.

  • calico node log
2023-11-07 10:03:05.205 [INFO][93] felix/ipsets.go 616: Resync found left-over Calico IP set. Queueing deletion. family="inet" setName="cali40all-hosts-net"
2023-11-07 10:03:05.206 [INFO][93] felix/ipsets.go 883: Deleting IP set. family="inet" setName="cali40all-hosts-net"
2023-11-07 10:03:05.206 [INFO][93] felix/ipsets.go 921: Deleting IP set. family="inet" setName="cali40all-hosts-net"
2023-11-07 10:03:05.287 [WARNING][93] felix/ipsets.go 927: Failed to delete IP set, may be out-of-sync. error=exit status 1 family="inet" output="ipset v7.11: Set cannot be destroyed: it is in use by a kernel component\n" setName="cali40all-hosts-net"
  • cali40all-hosts-net
$ ipset list cali40all-hosts-net

Name: cali40all-hosts-net
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 504
References: 2
Number of entries: 1
Members:
192.168.56.198
  • can not destroy cali40all-hosts-net
$ ipset destroy cali40all-hosts-net

ipset v7.15: Set cannot be destroyed: it is in use by a kernel component
  • no rule match cali40all-hosts-net in iptables.
$ iptables -nvL | grep cali40all-hosts-net

Steps to Reproduce (for bugs)

  1. Deploy Kubernetes using Kubespray v2.18.1. This installation includes Kubernetes v1.22.8 and Calico v3.20.3.
  2. Run reset.yml in kubespray to clean cluster.
  3. Deploy Kubernetes using Kubespray v2.23.0. This installation includes Kubernetes v1.27.5 and Calico v3.25.2.

Your Environment

  • Calico version
v3.20.3 to v3.25.2
  • Orchestrator version (e.g. kubernetes, mesos, rkt):
kubernetes:  v1.22.8 to v1.27.5
kubespray:  v2.18.1 to v2.23.0
  • Operating System and version:
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian

yckaolalala avatar Nov 08 '23 03:11 yckaolalala

FYI iptables -L doesn't show you all the rules: it only shows you the filter table. Better to use iptables-save to review everything.

That warning message by itself shouldn't be too much of a problem though: are you switching from IPIP to something else (intentionally)? That ipset is used in IPIP mode only.

There's probably another cause for any installation problems you have (switching from IPIP to something that doesn't in work in your environment could be that), but more info would be needed.

matthewdupre avatar Nov 08 '23 23:11 matthewdupre

It fixed after a REBOOT, ipset is destroyed, and calico-kube-controllers can start. I can reproduce it on my second node.

Before reboot.

  • calico-kube-controllers is CrashLoopBackOff
NAMESPACE     NAME                                         READY   STATUS             RESTARTS        AGE
kube-system   calico-kube-controllers-5b5bfd6db7-pt2wr     0/1     CrashLoopBackOff   7 (2m15s ago)   14m
2023-11-09 02:24:18.167 [INFO][1] main.go 107: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W1109 02:24:18.167890       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023-11-09 02:24:18.168 [INFO][1] main.go 131: Ensuring Calico datastore is initialized
2023-11-09 02:24:48.169 [ERROR][1] client.go 290: Error getting cluster information config ClusterInformation="default" error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:24:48.169 [INFO][1] main.go 138: Failed to initialize datastore error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:25:18.191 [ERROR][1] client.go 290: Error getting cluster information config ClusterInformation="default" error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:25:18.191 [INFO][1] main.go 138: Failed to initialize datastore error=Get "https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.233.0.1:443: i/o timeout
2023-11-09 02:25:18.191 [FATAL][1] main.go 151: Failed to initialize Calico datastore
  • try curl on node
$ curl -k https://10.233.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "clusterinformations.crd.projectcalico.org \"default\" is forbidden: User \"system:anonymous\" cannot get resource \"clusterinformations\" in API group \"crd.projectcalico.org\" at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "name": "default",
    "group": "crd.projectcalico.org",
    "kind": "clusterinformations"
  },
  "code": 403
}
  • ipset
$ iptables-save | grep cali40

-A cali-INPUT -p udp -m comment --comment "cali:J76FwxInZIsk7uKY" -m comment --comment "Allow IPv4 VXLAN packets from allowed hosts" -m multiport --dports 4789 -m set --match-set cali40all-vxlan-net src -m addrtype --dst-type LOCAL -j ACCEPT
-A cali-OUTPUT -p udp -m comment --comment "cali:ClE20y3NCwgoEuMI" -m comment --comment "Allow IPv4 VXLAN packets to other allowed hosts" -m multiport --dports 4789 -m addrtype --src-type LOCAL -m set --match-set cali40all-vxlan-net dst -j ACCEPT
-A cali-forward-check -p tcp -m comment --comment "cali:ZD-6UxuUtGW-xtzg" -m comment --comment "To kubernetes NodePort service" -m multiport --dports 30000:32767 -m set --match-set cali40this-host dst -g cali-set-endpoint-mark
-A cali-forward-check -p udp -m comment --comment "cali:CbPfUajQ2bFVnDq4" -m comment --comment "To kubernetes NodePort service" -m multiport --dports 30000:32767 -m set --match-set cali40this-host dst -g cali-set-endpoint-mark
-A cali-forward-check -m comment --comment "cali:jmhU0ODogX-Zfe5g" -m comment --comment "To kubernetes service" -m set ! --match-set cali40this-host dst -j cali-set-endpoint-mark
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
-A cali-nat-outgoing -m comment --comment "cali:flqWnvo8yq4ULQLa" -m set --match-set cali40masq-ipam-pools src -m set ! --match-set cali40all-ipam-pools dst -j MASQUERADE --random-fully

yckaolalala avatar Nov 09 '23 02:11 yckaolalala

@yckaolalala Interesting that the ipset doesn't appear in iptables at the point you recorded it. Could you please provide the full Felix log?

matthewdupre avatar Nov 20 '23 23:11 matthewdupre

@matthewdupre If any other logs need to be provided?

  • calico-node.yaml
            - name: FELIX_LOGSEVERITYSCREEN
              value: "debug"
            # Set Calico startup logging to "error"
            - name: CALICO_STARTUP_LOGLEVEL
              value: "debug"
  • pod status
NAME                                         READY   STATUS             RESTARTS       AGE     IP               NODE                 NOMINATED NODE   READINESS GATES
calico-kube-controllers-794577df96-tklxp     0/1     CrashLoopBackOff   10 (65s ago)   18m     10.233.107.129   k8s-192-168-56-198   <none>           <none>
calico-node-sscjq                            1/1     Running            0              3m16s   192.168.56.198   k8s-192-168-56-198   <none>           <none>
kube-apiserver-k8s-192-168-56-198            1/1     Running            1              19m     192.168.56.198   k8s-192-168-56-198   <none>           <none>
kube-controller-manager-k8s-192-168-56-198   1/1     Running            2              20m     192.168.56.198   k8s-192-168-56-198   <none>           <none>
kube-proxy-9n5ps                             1/1     Running            0              19m     192.168.56.198   k8s-192-168-56-198   <none>           <none>
kube-scheduler-k8s-192-168-56-198            1/1     Running            2 (16m ago)    20m     192.168.56.198   k8s-192-168-56-198   <none>           <none>

yckaolalala avatar Nov 21 '23 07:11 yckaolalala

I think problem the is that pod cannot connect to kube-apiserver even though calico-node is running. However, I can successfully connect to kube-apiserver on localhost using both the node IP and the default Kubernetes service IP.

curl -k https://10.233.0.1:443/version
curl -k https://192.168.56.198:6443/version
{
  "major": "1",
  "minor": "27",
  "gitVersion": "v1.27.5",
  "gitCommit": "93e0d7146fb9c3e9f68aa41b2b4265b2fcdb0a4c",
  "gitTreeState": "clean",
  "buildDate": "2023-08-24T00:42:11Z",
  "goVersion": "go1.20.7",
  "compiler": "gc",
  "platform": "linux/amd64"
}

I attempted to change the Calico datastore from kdd to etcd. After starting calico-node, the original issue persists, but the ipset cali40all-hosts-net is removed. calico-kube-controllers can start normally because it no longer needs to connect to kube-apiserver, and other pods still cannot connect to kube-apiserver. As a result, my CoreDNS fails to start due to this."

Strangely, all these problems were resolved after a reboot.

I also attempted to remove Calico and change the network plugin to Flannel, and my CoreDNS can start without a reboot.

yckaolalala avatar Nov 21 '23 08:11 yckaolalala

i have meet the same problem

chenguoquan1024 avatar Mar 13 '24 03:03 chenguoquan1024

Warning: iptables-legacy tables present, use iptables-legacy-save to see them

This message is a bit suspicious - is there a process running on this node that is using legacy iptables?

caseydavenport avatar Apr 02 '24 16:04 caseydavenport

I am closing this issue du to lack of activity and due to the fact that calico moved up to 3.28. Feel free to reopen if there is any new info.

tomastigera avatar Jul 30 '24 17:07 tomastigera