k3d icon indicating copy to clipboard operation
k3d copied to clipboard

[BUG] Pod network failing to start when installing calico operator with k3d v5.2.1

Open Glen-Tigera opened this issue 3 years ago • 20 comments

What did you do

  • How was the cluster created?

    • k3d cluster create "k3s-default" --k3s-arg '--flannel-backend=none@server:*'
  • What did you do afterwards? I tried to install the calico or tigera operator onto the cluster with containerIPForwarding enabled.

kubectl apply -f https://docs.projectcalico.org/manifests/tigera-operator.yaml
curl -L https://docs.projectcalico.org/manifests/custom-resources.yaml > k3d-custom-res.yaml
yq e '.spec.calicoNetwork.containerIPForwarding="Enabled"' -i k3d-custom-res.yaml
kubectl apply -f k3d-custom-res.yaml
  • k3d commands?

  • docker commands? docker ps to check running containers docker exec -ti <node> /bin/sh to ssh into a container

  • OS operations (e.g. shutdown/reboot)? Ran linux system cmds (ls, cat, etc...) inside pods and containers

What did you expect to happen

The pod network should be up and running successfully in all namespaces. All pods are in the running state.

Screenshots or terminal output

The calico-nodes are able to run without issue but other containers are stuck in the ContainerCreating state (coredns, metrics, calico-kube-controller)

$ kubectl get pods -A
NAMESPACE         NAME                                       READY   STATUS              RESTARTS   AGE
tigera-operator   tigera-operator-7dc6bc5777-h5sp7           1/1     Running             0          106s
calico-system     calico-typha-9b59bcc69-w2ml8               1/1     Running             0          83s
calico-system     calico-kube-controllers-78cc777977-8xf5v   0/1     ContainerCreating   0          83s
kube-system       coredns-7448499f4d-8pwtf                   0/1     ContainerCreating   0          106s
kube-system       metrics-server-86cbb8457f-h26x4            0/1     ContainerCreating   0          106s
kube-system       helm-install-traefik-h6qhh                 0/1     ContainerCreating   0          106s
kube-system       helm-install-traefik-crd-8xsxm             0/1     ContainerCreating   0          106s
kube-system       local-path-provisioner-5ff76fc89d-ql55s    0/1     ContainerCreating   0          106s
calico-system     calico-node-6xbq7                          1/1     Running             0          83s

When describing the stuck pods, I see this in its events:

$ kubectl describe pod/calico-kube-controllers-78cc777977-8xf5v -n calico-system

  Warning  FailedCreatePodSandBox  3s                    kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b474a530f7b8727fc101404ebb551135059f5aa359beb50bae176fd05cf2c20d": netplugin failed with no error message: fork/exec /opt/cni/bin/calico: no such file or directory

Based on the error above, I went to check /opt/cni/bin/calico to see if the calico binary existed in the container, which it does:

glen@glen-tigera: $ docker exec -ti k3d-k3s-default-server-0 /bin/sh
/ # ls
bin  dev  etc  k3d  lib  opt  output  proc  run  sbin  sys  tmp  usr  var
/ # cd /opt/cni/bin/
/opt/cni/bin # ls -a
.  ..  bandwidth  **calico**  calico-ipam  flannel  host-local  install  loopback  portmap  tags.txt  tuning

CNI Config Yaml: kubectl get cm cni-config -n calico-system -o yaml

apiVersion: v1
data:
  config: |-
    {
      "name": "k8s-pod-network",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "calico",
          "datastore_type": "kubernetes",
          "mtu": 0,
          "nodename_file_optional": false,
          "log_level": "Info",
          "log_file_path": "/var/log/calico/cni/cni.log",
          "ipam": { "type": "calico-ipam", "assign_ipv4" : "true", "assign_ipv6" : "false"},
          "container_settings": {
              "allow_ip_forwarding": true
          },
          "policy": {
              "type": "k8s"
          },
          "kubernetes": {
              "k8s_api_root":"https://10.43.0.1:443",
              "kubeconfig": "__KUBECONFIG_FILEPATH__"
          }
        },
        {
          "type": "bandwidth",
          "capabilities": {"bandwidth": true}
        },
        {"type": "portmap", "snat": true, "capabilities": {"portMappings": true}}
      ]
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2021-12-17T18:02:24Z"
  name: cni-config
  namespace: calico-system
  ownerReferences:
  - apiVersion: operator.tigera.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: Installation
    name: default
    uid: c53d18b5-efc6-4155-879b-6097a8c2c14c
  resourceVersion: "675"
  uid: 003c9cdc-0ef5-4d63-8d30-d6e1ed79d4c0

Which OS & Architecture

OS: GNU/Linux Kernel Version: 20.04.2-Ubuntu SMP Kernel Release: 5.11.0-40-generic Processor/HW Platform/Machine Architecture: x86_64

Which version of k3d

k3d version v5.2.1 k3s version v1.21.7-k3s1 (default)

Which version of docker

docker version:

Client: Docker Engine - Community
 Version:           20.10.11
 API version:       1.41
 Go version:        go1.16.9
 Git commit:        dea9396
 Built:             Thu Nov 18 00:37:06 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.11
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       847da18
  Built:            Thu Nov 18 00:35:15 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        `de40ad0`

docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
  scan: Docker Scan (Docker Inc., v0.9.0)

Server:
 Containers: 20
  Running: 0
  Paused: 0
  Stopped: 20
 Images: 22
 Server Version: 20.10.11
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.11.0-40-generic
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 31.09GiB
 Name: glen-tigera
 ID: 6EZ7:QGFF:Z2KK:Q7K3:YKGI:6FIS:X2UP:JX5W:UGXA:FIZW:CYV6:RDDU
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Glen-Tigera avatar Dec 17 '21 18:12 Glen-Tigera

@Glen-Tigera and I work for Tigera on Calico - we're trying to get operator install working with k3d (so we can add it to our overnight test runs).

We're baffled by this. From our side, we can see the calico binaries and config being written to the node, but still kubelet is complaining that it can't find the files.

We tried operator install with k3s, and that works fine, so I don't think its the OS.

Just wondered if you had any tips for what to try next.

lwr20 avatar Dec 17 '21 18:12 lwr20

The instructions in the k3d docs seem to work fine, i.e. https://k3d.io/v5.0.0/usage/advanced/calico.yaml works.

Comparing https://k3d.io/v5.0.0/usage/advanced/calico.yaml with https://docs.projectcalico.org/archive/v3.15/manifests/calico.yaml, we see:

lance@lwr20:~/scratch$ diff k3d_calico.yaml orig_calico.yaml
37,39d36
<           "container_settings": {
<             "allow_ip_forwarding": true
<           },
398a396,405
>               allowIPIPPacketsFromWorkloads:
>                 description: 'AllowIPIPPacketsFromWorkloads controls whether Felix
>                   will add a rule to drop IPIP encapsulated traffic from workloads
>                   [Default: false]'
>                 type: boolean
>               allowVXLANPacketsFromWorkloads:
>                 description: 'AllowVXLANPacketsFromWorkloads controls whether Felix
>                   will add a rule to drop VXLAN encapsulated traffic from workloads
>                   [Default: false]'
>                 type: boolean
2095c2102
<                   If not specified, then this is defaulted to "Never" (i.e. IPIP tunneling
---
>                   If not specified, then this is defaulted to "Never" (i.e. IPIP tunelling
2115c2122
<                   tunneling is disabled).
---
>                   tunelling is disabled).
3451,3452d3457
<         - key: node-role.kubernetes.io/master
<           effect: NoSchedule
3463c3468
<           image: calico/cni:v3.15.0
---
>           image: calico/cni:v3.15.5
3485c3490
<           image: calico/cni:v3.15.0
---
>           image: calico/cni:v3.15.5
3521c3526
<           image: calico/pod2daemon-flexvol:v3.15.0
---
>           image: calico/pod2daemon-flexvol:v3.15.5
3532c3537
<           image: calico/node:v3.15.0
---
>           image: calico/node:v3.15.5
3586,3591d3590
<             # Set MTU for the Wireguard tunnel device.
<             - name: FELIX_WIREGUARDMTU
<               valueFrom:
<                 configMapKeyRef:
<                   name: calico-config
<                   key: veth_mtu
3725c3724
<           image: calico/kube-controllers:v3.15.0
---
>           image: calico/kube-controllers:v3.15.5

re. the differences:

  • The FELIX_WIREGUARDMTU setting looks like its duplicated in the k3d manifest, so I'd hope that that wasn't the issue.
  • There's a toleration on the k3d manifest. But we see calico-node running in the output above, so I don't think that's it?
  • "allow_ip_forwarding": true in the k3d manifest, which we've enabled in the custom-resources by setting spec.calicoNetwork.containerIPForwarding="Enabled"

lwr20 avatar Dec 20 '21 09:12 lwr20

Hi @Glen-Tigera & @lwr20 , thanks for moving the issue over here from Slack :+1: I just gave the setup a try myself and obviously see the same issues as you. FWIW, I checked the logs of the node and see hundreds of lines like the following:

E1220 09:59:37.232636       7 plugins.go:748] Error dynamically probing plugins: Error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input
E1220 09:59:37.232851       7 driver-call.go:266] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input
W1220 09:59:37.232855       7 driver-call.go:149] FlexVolume: driver call failed: executable: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds, args: [init], error: fork/exec /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds: no such file or directory, output: ""

iwilltry42 avatar Dec 20 '21 10:12 iwilltry42

Awesome, thank you, that gives us a thread to pull on.

lwr20 avatar Dec 20 '21 10:12 lwr20

Googling for that error message, this issue in rke2 popped up: https://github.com/rancher/rke2/issues/234

iwilltry42 avatar Dec 20 '21 10:12 iwilltry42

From https://projectcalico.docs.tigera.io/reference/installation/api, I think this all means we need to set spec.flexVolumePath: "/usr/local/bin/" in the Installation resource in custom-resources

lwr20 avatar Dec 20 '21 11:12 lwr20

I could actually confirm, that the file is where it belongs:

docker exec -it k3d-k3s-default-server-0 stat /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds 
Alias tip: deti k3d-k3s-default-server-0 stat /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
  File: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
  Size: 4987070   	Blocks: 9744       IO Block: 4096   regular file
Device: 68h/104d	Inode: 31471848    Links: 1
Access: (0550/-r-xr-x---)  Uid: (    0/ UNKNOWN)   Gid: (    0/ UNKNOWN)
Access: 2021-12-20 12:09:10.946061593 +0000
Modify: 2021-12-20 12:09:10.762060830 +0000
Change: 2021-12-20 12:09:10.766060847 +0000
 Birth: -

iwilltry42 avatar Dec 20 '21 12:12 iwilltry42

Do you have any idea what https://github.com/rancher/rke2-charts/pull/20/files actually does?

lwr20 avatar Dec 20 '21 13:12 lwr20

@iwilltry42 When I ran the command your posted earlier, there was no such file or directory on my setup:

$ docker exec -it k3d-k3s-default-server-0 stat /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
stat: cannot stat '/usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds': No such file or directory

There is no nodeagent~uds directory when I try to look inside the container:

$ docker exec -ti k3d-k3s-default-server-0 ls -a /usr/libexec/kubernetes/kubelet-plugins/volume/exec
.  ..

Glen-Tigera avatar Dec 20 '21 14:12 Glen-Tigera

Do you have any idea what https://github.com/rancher/rke2-charts/pull/20/files actually does?

No idea, but since that's canal, I guess it's not valid for K3s which usually runs flannel.

iwilltry42 avatar Dec 20 '21 14:12 iwilltry42

There is no nodeagent~uds directory when I try to look inside the container:

$ docker exec -ti k3d-k3s-default-server-0 ls -a /usr/libexec/kubernetes/kubelet-plugins/volume/exec
.  ..

False alarm, this is because I specified spec.flexVolumePath: "/usr/local/bin/" in the installation

Glen-Tigera avatar Dec 20 '21 15:12 Glen-Tigera

@lwr20 and I have looked into the /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/ directory inside the container. It contains the uds file

/ # ls -l /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/
total 4872
-r-xr-x--- 1 0 0 4987070 Dec 20 15:25 uds

But it seems like we can't run it though:

/ # /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
/bin/sh: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds: not found

Glen-Tigera avatar Dec 20 '21 15:12 Glen-Tigera

Do you have any idea what rancher/rke2-charts#20 (files) actually does?

Sorry, just understood how you got there. This is the script that's being executed: https://github.com/projectcalico/calico/blob/master/pod2daemon/flexvol/docker/flexvol.sh

I'm checking the variants of installation now (the one from k3d docs and yours) with regards to the uds:

Via Operator:

/ # ls -lah /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
total 4.8M
drwxr-xr-x 2 0 0 4.0K Dec 21 06:49 .
drwxr-xr-x 3 0 0 4.0K Dec 21 06:49 ..
-r-xr-x--- 1 0 0 4.8M Dec 21 06:49 uds

/ # stat /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
  File: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
  Size: 4987070   	Blocks: 9744       IO Block: 4096   regular file
Device: 37h/55d	Inode: 43271409    Links: 1
Access: (0550/-r-xr-x---)  Uid: (    0/ UNKNOWN)   Gid: (    0/ UNKNOWN)
Access: 2021-12-21 06:49:35.595982143 +0000
Modify: 2021-12-21 06:49:35.531982019 +0000
Change: 2021-12-21 06:49:35.531982019 +0000
 Birth: -

/ # /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
sh: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds: not found

Without Operator:

/ # ls -lah /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
total 5.4M
drwxr-xr-x 2 0 0 4.0K Dec 21 06:52 .
drwxr-xr-x 3 0 0 4.0K Dec 21 06:52 ..
-r-xr-x--- 1 0 0 5.4M Dec 21 06:52 uds
/ # stat /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds 
  File: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds
  Size: 5602363   	Blocks: 10944      IO Block: 4096   regular file
Device: 37h/55d	Inode: 42735669    Links: 1
Access: (0550/-r-xr-x---)  Uid: (    0/ UNKNOWN)   Gid: (    0/ UNKNOWN)
Access: 2021-12-21 06:52:46.752353250 +0000
Modify: 2021-12-21 06:52:46.092351969 +0000
Change: 2021-12-21 06:52:46.100351984 +0000
 Birth: -
/ # /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds/uds 
Usage:
  flexvoldrv [command]

Available Commands:
  help        Help about any command
  init        Flex volume init command.
  mount       Flex volume unmount command.
  unmount     Flex volume unmount command.
  version     Print version

Flags:
  -h, --help   help for flexvoldrv

Use "flexvoldrv [command] --help" for more information about a command.

iwilltry42 avatar Dec 21 '21 06:12 iwilltry42

Regarding the differences between the deployed manifests: The DaemonSet is handled by the operator and rewrites the image tags from v3.15.5 to v3.21.2. I see that quite some things changed there also around the flexvol part, especially since pod2daemon was included in the monorepo :thinking: I tried to use the ImageSet to get back to v3.15.0 for testing with the operator, but then the expected path of e.g. the install-cni script is wrong :thinking:

iwilltry42 avatar Dec 21 '21 07:12 iwilltry42

Ah - the way that the tigera-operator works, there's a version of operator that maps to a version of calico (since the manifests are baked into it). For v3.15, you'll want to apply: https://docs.projectcalico.org/archive/v3.15/manifests/tigera-operator.yaml

(the intent is to make the upgrade experience better - in an operator managed cluster, you upgrade calico by simply applying the uplevel tigera-operator.yaml and it takes care of everything). In the old manifest install, you'd have customised your install in various ways directly in the yaml, so to upgrade you have to get the new yaml, then make the same edits as you did before, then apply and hope you did it right. Whereas in an operator setup, you have configured all your customisations in the Installation resource. The new operator reads that and does "the right thing" to apply those customisations.

lwr20 avatar Dec 21 '21 10:12 lwr20

I tried installing an older version of the operator and CRD (v3.15) on the k3d cluster. That was working so its possible the issue could be on our side.

k3d cluster create "k3d-test-cluster-3-15" --k3s-arg "--flannel-backend=none@server:*" --k3s-arg "--no-deploy=traefik@server:*"
kubectl apply -f https://docs.projectcalico.org/archive/v3.15/manifests/tigera-operator.yaml
kubectl apply -f https://docs.projectcalico.org/archive/v3.15/manifests/custom-resources.yaml
$ kubectl get pods -A -o wide
NAMESPACE         NAME                                      READY   STATUS    RESTARTS   AGE     IP              NODE                             NOMINATED NODE   READINESS GATES
tigera-operator   tigera-operator-5bf967b87f-8528g          1/1     Running   0          4m      192.168.160.3   k3d-test-cluster-3-15-server-0   <none>           <none>
calico-system     calico-typha-fb8798b8f-kmpqf              1/1     Running   0          3m44s   192.168.160.2   k3d-test-cluster-3-15-agent-0    <none>           <none>
calico-system     calico-kube-controllers-c8496f5c-67ljz    1/1     Running   0          3m44s   192.168.48.65   k3d-test-cluster-3-15-server-0   <none>           <none>
kube-system       local-path-provisioner-5ff76fc89d-rfsrf   1/1     Running   0          4m7s    192.168.48.66   k3d-test-cluster-3-15-server-0   <none>           <none>
kube-system       coredns-7448499f4d-9zfh6                  1/1     Running   0          4m7s    192.168.48.67   k3d-test-cluster-3-15-server-0   <none>           <none>
kube-system       metrics-server-86cbb8457f-j9rs4           1/1     Running   0          4m7s    192.168.48.68   k3d-test-cluster-3-15-server-0   <none>           <none>
calico-system     calico-node-rg6tv                         1/1     Running   0          3m44s   192.168.160.2   k3d-test-cluster-3-15-agent-0    <none>           <none>
calico-system     calico-node-66fzx                         1/1     Running   0          3m44s   192.168.160.3   k3d-test-cluster-3-15-server-0   <none>           <none>
calico-system     calico-typha-fb8798b8f-pb2wj              1/1     Running   0          105s    192.168.160.3   k3d-test-cluster-3-15-server-0   <none>           <none>

Glen-Tigera avatar Dec 21 '21 20:12 Glen-Tigera

Upon further testing, our v3.21 (latest release) operator install seems to no longer be compatible with k3d clusters. I tested the operator starting from v3.15 and every version was working till v3.21. Followed up with the larger team to discuss further.

k3d-calico-operator-install-findings.txt

Glen-Tigera avatar Dec 21 '21 22:12 Glen-Tigera

Ah at least you could track it down to a specific version already 👍 Fingers crossed you'll figure out the root cause.

iwilltry42 avatar Dec 21 '21 23:12 iwilltry42

We ran into this with K3s as well. The same exact issue using the operator.

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5+k3s1", GitCommit:"405bf79da97831749733ad99842da638c8ee4802", GitTreeState:"clean", BuildDate:"2021-12-18T00:43:30Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

kube-system svclb-traefik-7wk8q 0/2 CrashLoopBackOff 10 (59s ago) 4m [root@k3s-master ~]# kubectl logs svclb-traefik-7wk8q Error from server (NotFound): pods "svclb-traefik-7wk8q" not found [root@k3s-master ~]# kubectl logs -n kube-system svclb-traefik-7wk8q error: a container name must be specified for pod svclb-traefik-7wk8q, choose one of: [lb-port-80 lb-port-443] [root@k3s-master ~]# kubectl logs -n kube-system svclb-traefik-7wk8q -c lb-port-80

  • trap exit TERM INT
  • echo 10.43.233.214
  • grep -Eq :
  • cat /proc/sys/net/ipv4/ip_forward
  • '[' 0 '!=' 1 ]
  • exit 1

With the operator there is no way we found to set the :

"container_settings": { "allow_ip_forwarding": true }

Setting. We changed it in: vi /etc/cni/net.d/10-calico.conflist

and changed it in the cmi-config CM. The value kept getting changed back we assume by the operator.

dlohin avatar Jan 28 '22 13:01 dlohin

Henlo friends.

TLDR: I fixed my problem by pulling down fresh calico and calico-ipam executables. as per calico CNI install docs.

Ok so I've been trialing k3os as a platform replacement, which led me here as I was working with calico, my findings may be of assistance.

I'd disabled flannel (and Traefik and sericelb) when k3os installed. Next I added calico using the operator as per the calico instructions. EXACTLY the same problem as above, the /opt/cni/bin/calico was there but no one was happy about it.

So I pulled the two executables listed in the basic calico install instructions* here and tada up came the calico-system-controller and moments later the two calico api servers!

look for the binary paths to download under the block of text 'Install the CNI plugin Binaries'

I tried a bunch of different configurations after I solved the problem.

  • originally I had added the containerIPForwarding: Enabled fellow. Turns out I could remove this and adding the operator and cr's seemed ok (note when I say ok the containers are running thats as far as ive gone).
  • calicoNetwork.ipPools.cidr can overlap or not. It doesn't seem to impact the pods coming up.

I hope this is of help to all or some of you as your work above really helped me get through.

delprofundo avatar Feb 05 '22 07:02 delprofundo