talos icon indicating copy to clipboard operation
talos copied to clipboard

Konnectivity server & agent

Open maxpain opened this issue 1 year ago • 11 comments

Hello.

It would be cool to have built-in support for the Konnectivity agent and server in Talos Linux. Use case: I deployed my cluster on Hetzner, but my control plane nodes are on Cloud VMs, and the worker nodes are on Bare Metal. I installed Cilium with native routing without encapsulation since I have an L2 network between bare metal nodes; however, the control plane nodes are in a different network. I don't want to install Cilium or any CNI on the control plane nodes, but kube-apiserver requires access to the cluster network (admission webhooks, etc.), and I have to use something like "Konnectivity".

image

Or is there some elegant way to implement this in Talos Linux without Konnectivity?

maxpain avatar Sep 28 '24 22:09 maxpain

This could also potentially simplify "Kubernetes as a Service" implementations, in which control plane nodes are completely hidden and managed (like GKE).

maxpain avatar Sep 28 '24 22:09 maxpain

I was also wondering if KubeSpan could be helpful here. Maybe we can just route service CIDR from the control plane to the worker nodes via wireguard?

maxpain avatar Oct 01 '24 10:10 maxpain

Related

https://github.com/cilium/cilium/issues/22898 https://github.com/cilium/cilium/issues/32810

maxpain avatar Oct 01 '24 10:10 maxpain

I was also wondering if KubeSpan could be helpful here. Maybe we can just route service CIDR from the control plane to the worker nodes via wireguard?

KubeSpan today work on top of node IPs, so the CNI does pod -> node translation.

KubeSpan has a not enabled by default setting to route pod IPs, but that is not enabled by default.

KubeSpan never worked for Service IPs, as that requires some form of kube-proxy which would translate service IPs to pod IPs.

smira avatar Oct 01 '24 11:10 smira

KubeSpan never worked for Service IPs, as that requires some form of kube-proxy which would translate service IPs to pod IPs.

It should be done on the worker node side since CNI is installed there. Theoretically, we just need to route the service cidr from the control plane nodes to the worker nodes somehow..

maxpain avatar Oct 01 '24 11:10 maxpain

Does Konnectivity route the service ips properly? In a hosted platform (EKS, GKE) I'm not sure if admission webhooks run on the control plane nodes. I think they usually run at some other endpoint (eg VPC endpoint) all of the nodes in the network can reach.

Would SideroLink with Omni solve this problem? I don't think so but don't know the full details of how k8s services interact with SideroLink

rothgar avatar Oct 02 '24 18:10 rothgar

@rothgar

Does Konnectivity route the service ips properly?

generally speaking (in very similar config) - yes, it routes correctly.

gecube avatar Oct 08 '24 16:10 gecube

I personally replaced Konnectivity with Tailscale, which is better, but If someone reads this and still needs Konnectivity, I managed to deploy it on Talos:

Controlplane machineconfig patch, we run konnectivity-server as static pods on controlplane nodes:

cluster:
  proxy:
    disabled: true

  apiServer:
    extraArgs:
      egress-selector-config-file: /var/lib/egress-selector.yaml

    extraVolumes:
      - hostPath: /var/lib/egress-selector.yaml
        mountPath: /var/lib/egress-selector.yaml
        readonly: true
      - hostPath: /etc/kubernetes/konnectivity-server
        mountPath: /etc/kubernetes/konnectivity-server
        readonly: false

machine:
  files:
    - content: |
        apiVersion: apiserver.k8s.io/v1beta1
        kind: EgressSelectorConfiguration
        egressSelections:
        - name: cluster
          connection:
            proxyProtocol: GRPC
            transport:
              uds:
                udsName: /etc/kubernetes/konnectivity-server/konnectivity-server.socket
      path: /var/lib/egress-selector.yaml
      permissions: 0o444
      op: create

  pods:
    - apiVersion: v1
      kind: Pod
      metadata:
        name: konnectivity-server
      spec:
        priorityClassName: system-cluster-critical
        securityContext:
          runAsGroup: 65534
          runAsNonRoot: false
          runAsUser: 0
          fsGroup: 0
        hostNetwork: true
        initContainers:
          - name: init
            image: busybox
            command:
              - "sh"
              - "-c"
              - "touch /etc/kubernetes/konnectivity-server/konnectivity-server.socket && chmod -R 777 /etc/kubernetes/konnectivity-server"
            volumeMounts:
              - name: konnectivity-uds
                mountPath: /etc/kubernetes/konnectivity-server
        containers:
          - name: konnectivity-server-container
            image: registry.k8s.io/kas-network-proxy/proxy-server:v0.30.3
            command: ["/proxy-server"]
            args: [
                "--logtostderr=true",
                "--uds-name=/etc/kubernetes/konnectivity-server/konnectivity-server.socket",
                # "--delete-existing-uds-file",
                "--cluster-cert=/system/secrets/kubernetes/kube-apiserver/apiserver.crt",
                "--cluster-key=/system/secrets/kubernetes/kube-apiserver/apiserver.key",
                "--mode=grpc",
                "--server-port=0",
                "--agent-port=8132",
                "--admin-port=8133",
                "--health-port=8134",
                "--agent-namespace=kube-system",
                "--agent-service-account=konnectivity-agent",
                "--kubeconfig=/system/secrets/kubernetes/kube-scheduler/kubeconfig",
                "--authentication-audience=system:konnectivity-server",
              ]
            # There is no CNI on the control plane nodes, so we can use localhost.
            env:
              - name: KUBERNETES_SERVICE_HOST
                value: localhost
              - name: KUBERNETES_SERVICE_PORT
                value: "7445"
            livenessProbe:
              httpGet:
                scheme: HTTP
                host: 127.0.0.1
                port: 8134
                path: /healthz
              initialDelaySeconds: 30
              timeoutSeconds: 60
            ports:
              - name: agentport
                containerPort: 8132
                hostPort: 8132
              - name: adminport
                containerPort: 8133
                hostPort: 8133
              - name: healthport
                containerPort: 8134
                hostPort: 8134
            volumeMounts:
              - name: secrets
                mountPath: /system/secrets/kubernetes
                readOnly: true
              - name: konnectivity-uds
                mountPath: /etc/kubernetes/konnectivity-server
                readOnly: false
        volumes:
          - name: secrets
            hostPath:
              path: /system/secrets/kubernetes
          - name: konnectivity-uds
            hostPath:
              path: /etc/kubernetes/konnectivity-server
              type: DirectoryOrCreate

konnectivity-agent deployment to be deployed on the worker nodes:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    k8s-app: konnectivity-agent
  namespace: kube-system
  name: konnectivity-agent
spec:
  selector:
    matchLabels:
      k8s-app: konnectivity-agent
  template:
    metadata:
      labels:
        k8s-app: konnectivity-agent
    spec:
      hostNetwork: true
      priorityClassName: system-cluster-critical
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: node-role.kubernetes.io/control-plane
                    operator: DoesNotExist
      containers:
        - image: registry.k8s.io/kas-network-proxy/proxy-agent:v0.30.3
          name: konnectivity-agent
          command: ["/proxy-agent"]
          args: [
              "--logtostderr=true",
              "--ca-cert=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt",
              "--proxy-server-host=10.200.0.5", # I used Hetzner Cloud Load Balancer, which proxies requests to the Konnectivity server
              "--proxy-server-port=8132",
              "--admin-server-port=8133",
              "--health-server-port=8134",
              "--service-account-token-path=/var/run/secrets/tokens/konnectivity-agent-token",
            ]
          volumeMounts:
            - mountPath: /var/run/secrets/tokens
              name: konnectivity-agent-token
          livenessProbe:
            httpGet:
              port: 8134
              path: /healthz
            initialDelaySeconds: 15
            timeoutSeconds: 15
      serviceAccountName: konnectivity-agent
      volumes:
        - name: konnectivity-agent-token
          projected:
            sources:
              - serviceAccountToken:
                  path: konnectivity-agent-token
                  audience: system:konnectivity-server
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:konnectivity-server
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
  - apiGroup: rbac.authorization.k8s.io
    kind: User
    name: system:konnectivity-server
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: konnectivity-agent
  namespace: kube-system
  labels:
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile

maxpain avatar Oct 16 '24 13:10 maxpain

Thanks for sharing. Was there a reason you switched to tailscale instead of using Konnectivity? Would you mind adding these examples to our documentation in case someone wants to use it in the future?

rothgar avatar Oct 18 '24 22:10 rothgar

Was there a reason you switched to tailscale instead of using Konnectivity?

Yes. In my case, the Control Plane nodes (Hetzner Cloud) still could reach the Kubelets on the Worker nodes (Hetzner Bare Metal) via Hetzner vSwitch. The problem is that these are two different L2 subnets with a gateway between them, so the control plane nodes couldn't reach the Pod and Service networks on the vSwitch side without hitting that gateway (I use Cilium Native Routing, so the network has to be flat).

So I needed something to help Control Plane Nodes reach only pod/service subnets, not kubelets.

Tailscale is the perfect solution because it just routes those subnets using P2P mesh without hassle with manifests/certificates as if using Konnectivity (I used kube-scheduler kubeconfig and cert/keys from kube-apiserver as a workaround because of this). You also need a Load Balancer on the Konnectivity-server side.

Also, sometimes, I got "Agent is not available" errors from Konnectivity-server when trying to view the pod logs.

maxpain avatar Oct 19 '24 07:10 maxpain

Would you mind adding these examples to our documentation in case someone wants to use it in the future?

What we can do with those workarounds with kubeconfig, certs and keys?

maxpain avatar Oct 19 '24 07:10 maxpain

Until we have a way to natively handle those secrets in Talos it probably doesn't make sense to write docs for it. The manual way of writing secrets to disk and mounting them is not something we want to recommend.

rothgar avatar Oct 22 '24 23:10 rothgar

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 09 '25 02:05 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar May 14 '25 02:05 github-actions[bot]