colima icon indicating copy to clipboard operation
colima copied to clipboard

Kubernetes CNI chained plugin configuration ignored on v0.4.x

Open Cerebus opened this issue 3 years ago • 2 comments

Installing a chained CNI plugin seems to be ignored on v0.4.x. The CNI config gets set properly in /var/lib/rancher/k3s/agent/etc/cni/net.d/ and the plugin executable gets installed properly in /var/lib/rancher/k3s/data/current/bin/, but it's never called. No CNI-related logs are emitted, and Pods come up only with the default flannel configuration.

The CNI in question works like multus; it delegates to the default plugin (flannel) first and does other things later in the chain without interfering with it.

This works in v0.3.x. Using docker driver in both cases.

I'm going to guess that this has something to do with the addition of embedded networking with v0.4.0. I'm open to workarounds; colima is a smoother experience in my environment that minikube, but I do a lot of CNI-related things so it would be nice to have it working again.

Cerebus avatar Aug 01 '22 18:08 Cerebus

Yeah. I believe this is mainly due to use of https://github.com/Mirantis/cri-dockerd to cater for the deprecation of the docker support in k3s.

Do you mind providing steps to simulate your scenario? I will give it a go and see if there is a possible fix.

abiosoft avatar Aug 01 '22 20:08 abiosoft

Unfortunately I don't have a public repo I can point you at, or anything remotely ready-to-run.

The bandwidth plugin should trigger this behavior, I think, but will require more manual setup.

A multus or meshnet should have this issue, but both will have to be tweaked. E.g., meshnet's daemonset has to point to the correct volumes[].hostPath to find the CNI directories (the ds will install the config chain and the meshnet plugin binary):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: meshnet
  labels:
    k8s-app: meshnet
spec:
  selector:
    matchLabels:
      name: meshnet
  template:
    metadata:
      labels:
        name: meshnet
    spec:
      hostNetwork: true
      hostPID: true
      hostIPC: true
      serviceAccountName: meshnet
      nodeSelector:
        beta.kubernetes.io/arch: amd64
      tolerations:
        - operator: Exists
          effect: NoSchedule
      containers:
        - name: meshnet
          securityContext:
            privileged: true
          image: networkop/meshnet:latest
          imagePullPolicy: IfNotPresent
          resources:
            limits:
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 200Mi
          env:
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          volumeMounts:
            - name: cni-cfg
              mountPath: /etc/cni/net.d
            - name: cni-bin
              mountPath: /opt/cni/bin
            - name: var-run-netns
              mountPath: /var/run/netns
              mountPropagation: Bidirectional
      terminationGracePeriodSeconds: 30
      volumes:
        - name: cni-bin
          hostPath:
            path: /var/lib/rancher/k3s/data/current/bin
        - name: cni-cfg
          hostPath:
            path: /var/lib/rancher/k3s/agent/etc/cni/net.d
        - name: var-run-netns
          hostPath:
            path: /var/run/netns

But this will naturally need all the rest of the meshnet deployment (crds, ns, sa, clusterrole, etc.), plus the demonstration (i.e. a couple of nodes and a Topology to connect them).

Cerebus avatar Aug 01 '22 22:08 Cerebus

@Cerebus changing the path should work.

volumes:
  - name: cni-bin
    hostPath:
      path: /usr/libexec/cni
  - name: cni-cfg
    hostPath:
      path: /etc/cni/net.d

abiosoft avatar Oct 18 '22 03:10 abiosoft

Those paths

@Cerebus changing the path should work.

volumes:
  - name: cni-bin
    hostPath:
      path: /usr/libexec/cni
  - name: cni-cfg
    hostPath:
      path: /etc/cni/net.d

Nope. /etc/cni/net.d doesn't exist in a k3s deployment; it's in /var/lib/rancher. Second, the stuff in libexec is ignored by k3s; it installs its own binaries in /var/lib/rancher as above.

ETA: with the docker runtime. Works with the containerd runtime, but I need dockerd as well.

Cerebus avatar Oct 26 '22 15:10 Cerebus

ETA: with the docker runtime. Works with the containerd runtime, but I need dockerd as well.

This is mainly what I'm trying to confirm.

The cni setup is ignored for the docker runtime, that's most likely the cause.

Are you available to assist with testing? I can push out a quick fix for this.

abiosoft avatar Oct 26 '22 16:10 abiosoft

In a slow loop, yes. :)

Cerebus avatar Oct 26 '22 16:10 Cerebus