colima
colima copied to clipboard
Kubernetes CNI chained plugin configuration ignored on v0.4.x
Installing a chained CNI plugin seems to be ignored on v0.4.x. The CNI config gets set properly in /var/lib/rancher/k3s/agent/etc/cni/net.d/ and the plugin executable gets installed properly in /var/lib/rancher/k3s/data/current/bin/, but it's never called. No CNI-related logs are emitted, and Pods come up only with the default flannel configuration.
The CNI in question works like multus; it delegates to the default plugin (flannel) first and does other things later in the chain without interfering with it.
This works in v0.3.x. Using docker driver in both cases.
I'm going to guess that this has something to do with the addition of embedded networking with v0.4.0. I'm open to workarounds; colima is a smoother experience in my environment that minikube, but I do a lot of CNI-related things so it would be nice to have it working again.
Yeah. I believe this is mainly due to use of https://github.com/Mirantis/cri-dockerd to cater for the deprecation of the docker support in k3s.
Do you mind providing steps to simulate your scenario? I will give it a go and see if there is a possible fix.
Unfortunately I don't have a public repo I can point you at, or anything remotely ready-to-run.
The bandwidth plugin should trigger this behavior, I think, but will require more manual setup.
A multus or meshnet should have this issue, but both will have to be tweaked. E.g., meshnet's daemonset has to point to the correct volumes[].hostPath to find the CNI directories (the ds will install the config chain and the meshnet plugin binary):
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: meshnet
labels:
k8s-app: meshnet
spec:
selector:
matchLabels:
name: meshnet
template:
metadata:
labels:
name: meshnet
spec:
hostNetwork: true
hostPID: true
hostIPC: true
serviceAccountName: meshnet
nodeSelector:
beta.kubernetes.io/arch: amd64
tolerations:
- operator: Exists
effect: NoSchedule
containers:
- name: meshnet
securityContext:
privileged: true
image: networkop/meshnet:latest
imagePullPolicy: IfNotPresent
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
volumeMounts:
- name: cni-cfg
mountPath: /etc/cni/net.d
- name: cni-bin
mountPath: /opt/cni/bin
- name: var-run-netns
mountPath: /var/run/netns
mountPropagation: Bidirectional
terminationGracePeriodSeconds: 30
volumes:
- name: cni-bin
hostPath:
path: /var/lib/rancher/k3s/data/current/bin
- name: cni-cfg
hostPath:
path: /var/lib/rancher/k3s/agent/etc/cni/net.d
- name: var-run-netns
hostPath:
path: /var/run/netns
But this will naturally need all the rest of the meshnet deployment (crds, ns, sa, clusterrole, etc.), plus the demonstration (i.e. a couple of nodes and a Topology to connect them).
@Cerebus changing the path should work.
volumes:
- name: cni-bin
hostPath:
path: /usr/libexec/cni
- name: cni-cfg
hostPath:
path: /etc/cni/net.d
Those paths
@Cerebus changing the path should work.
volumes: - name: cni-bin hostPath: path: /usr/libexec/cni - name: cni-cfg hostPath: path: /etc/cni/net.d
Nope. /etc/cni/net.d doesn't exist in a k3s deployment; it's in /var/lib/rancher. Second, the stuff in libexec is ignored by k3s; it installs its own binaries in /var/lib/rancher as above.
ETA: with the docker runtime. Works with the containerd runtime, but I need dockerd as well.
ETA: with the docker runtime. Works with the containerd runtime, but I need dockerd as well.
This is mainly what I'm trying to confirm.
The cni setup is ignored for the docker runtime, that's most likely the cause.
Are you available to assist with testing? I can push out a quick fix for this.
In a slow loop, yes. :)