kilo icon indicating copy to clipboard operation
kilo copied to clipboard

Running kilo on RKE deployed clusters

Open jbrinksmeier opened this issue 4 years ago • 5 comments

Hi First of all thank you for your awesome work with this project, much appreciated. We test kilo currently with our clusters that we deploy with RKE and import to rancher later on. We use it as CNI provider and in a full-mesh layout. We used the kilo-k3.yaml as our reference and had to lower the mtu setting in the cni-conf.json configmap to 1300. The rancher-node-agent tries to open a wss:// connection to the rancher server which did not succeed with the original 1420 setting. While the 1300 was just our first lucky shot, it might be worth further testing to be as high as possible but we had no problems with this setting so far. Do you think this is worth documenting in this project? If yes, could you suggest a good place (maybe another file in manifests) so that I can suggest a PR?

jbrinksmeier avatar Jul 31 '20 21:07 jbrinksmeier

Hi @jbrinksmeier thanks for raising this issue. This certainly seems like something worth documenting.

Do you have any idea why you needed to lower the MTU? Also, if you lowered the MTU in the CNI configuration, the change won't affect Pods running in the host networking namespace, so I wonder if these Pods will still have problems with large IP packets. Can you share the output of ip l on one of your hosts?

squat avatar Aug 08 '20 09:08 squat

Frankly, I have no idea why this websocket connection needed such a low(er) MTU. decreasing it was in fact a lucky shot as I had such issues with any kind of VPN software so far, that's why I gave it a shot when this particular issue came up. We are not done testing yet as I am on vacation right now, but so far we had no issues with the running cluster, operating fairly simple services as various databases and some php containers. Anyway, here's the output of ip l of one host as requested:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 12:42:63:7c:30:c1 brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether ae:ce:41:23:45:f0 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default 
    link/ether 02:42:76:45:79:d0 brd ff:ff:ff:ff:ff:ff
83: kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 5e:64:6c:df:8a:86 brd ff:ff:ff:ff:ff:ff
85: kilo0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/none 
86: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
88: vethdcaf3da5@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc noqueue master kube-bridge state UP mode DEFAULT group default 
    link/ether 62:fa:29:a7:9e:ef brd ff:ff:ff:ff:ff:ff link-netnsid 0
148: veth5222bf5b@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc noqueue master kube-bridge state UP mode DEFAULT group default 
    link/ether a2:06:2e:fe:a3:85 brd ff:ff:ff:ff:ff:ff link-netnsid 1
161: veth78b9147f@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc noqueue master kube-bridge state UP mode DEFAULT group default 
    link/ether 5e:64:6c:df:8a:86 brd ff:ff:ff:ff:ff:ff link-netnsid 2
162: veth6c407d3d@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc noqueue master kube-bridge state UP mode DEFAULT group default 
    link/ether d2:08:43:6b:3c:6c brd ff:ff:ff:ff:ff:ff link-netnsid 3
164: veth6184d2bd@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1300 qdisc noqueue master kube-bridge state UP mode DEFAULT group default 
    link/ether b6:18:fd:86:41:77 brd ff:ff:ff:ff:ff:ff link-netnsid 4

jbrinksmeier avatar Aug 10 '20 09:08 jbrinksmeier

I just noted the pods in host network part, missed the importance of host-network in that regard. I will test these days if we are able to operate pods in host network properly

jbrinksmeier avatar Aug 10 '20 10:08 jbrinksmeier

Could you please share the config for RKE setup? Thanks!

laci84 avatar Sep 05 '20 18:09 laci84

@squat So far we had no issues running pods in the host-network. It certainly seems to be only an issue for this websocket request to the rancher cluster to register which is a one time task.

Could you please share the config for RKE setup? Thanks!

@laci84 sure can. As mentioned this is simply the manifest from here: https://github.com/squat/kilo/blob/master/manifests/kilo-k3s.yaml with a changed mtu setting. here you go

apiVersion: v1
kind: ConfigMap
metadata:
  name: kilo
  namespace: kube-system
  labels:
    app.kubernetes.io/name: kilo
data:
  cni-conf.json: |
    {
       "cniVersion":"0.3.1",
       "name":"kilo",
       "plugins":[
          {
             "name":"kubernetes",
             "type":"bridge",
             "bridge":"kube-bridge",
             "isDefaultGateway":true,
             "forceAddress":true,
             "mtu": 1300,
             "ipam":{
                "type":"host-local"
             }
          },
          {
             "type":"portmap",
             "snat":true,
             "capabilities":{
                "portMappings":true
             }
          }
       ]
    }
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kilo
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kilo
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - patch
      - watch
  - apiGroups:
      - kilo.squat.ai
    resources:
      - peers
    verbs:
      - list
      - update
      - watch
  - apiGroups:
      - apiextensions.k8s.io
    resources:
      - customresourcedefinitions
    verbs:
      - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kilo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kilo
subjects:
  - kind: ServiceAccount
    name: kilo
    namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kilo
  namespace: kube-system
  labels:
    app.kubernetes.io/name: kilo
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kilo
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kilo
    spec:
      serviceAccountName: kilo
      hostNetwork: true
      containers:
        - name: kilo
          image: squat/kilo
          args:
            - --kubeconfig=/etc/kubernetes/kubeconfig
            - --hostname=$(NODE_NAME)
            - --mesh-granularity=full
            - --subnet=10.5.0.0/24
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          securityContext:
            privileged: true
          volumeMounts:
            - name: cni-conf-dir
              mountPath: /etc/cni/net.d
            - name: kilo-dir
              mountPath: /var/lib/kilo
            - name: kubeconfig
              mountPath: /etc/kubernetes/kubeconfig
              readOnly: true
            - name: lib-modules
              mountPath: /lib/modules
              readOnly: true
            - name: xtables-lock
              mountPath: /run/xtables.lock
              readOnly: false
      initContainers:
        - name: install-cni
          image: squat/kilo
          command:
            - /bin/sh
            - -c
            - set -e -x;
              cp /opt/cni/bin/* /host/opt/cni/bin/;
              TMP_CONF="$CNI_CONF_NAME".tmp;
              echo "$CNI_NETWORK_CONFIG" > $TMP_CONF;
              rm -f /host/etc/cni/net.d/*;
              mv $TMP_CONF /host/etc/cni/net.d/$CNI_CONF_NAME
          env:
            - name: CNI_CONF_NAME
              value: 10-kilo.conflist
            - name: CNI_NETWORK_CONFIG
              valueFrom:
                configMapKeyRef:
                  name: kilo
                  key: cni-conf.json
          volumeMounts:
            - name: cni-bin-dir
              mountPath: /host/opt/cni/bin
            - name: cni-conf-dir
              mountPath: /host/etc/cni/net.d
      tolerations:
        - effect: NoSchedule
          operator: Exists
        - effect: NoExecute
          operator: Exists
      volumes:
        - name: cni-bin-dir
          hostPath:
            path: /opt/cni/bin
        - name: cni-conf-dir
          hostPath:
            path: /etc/cni/net.d
        - name: kilo-dir
          hostPath:
            path: /var/lib/kilo
        - name: kubeconfig
          hostPath:
            path: /etc/kubernetes/kilo_kube_conf.yaml
        - name: lib-modules
          hostPath:
            path: /lib/modules
        - name: xtables-lock
          hostPath:
            path: /run/xtables.lock
            type: FileOrCreate

jbrinksmeier avatar Sep 07 '20 08:09 jbrinksmeier