hetzner-k3s icon indicating copy to clipboard operation
hetzner-k3s copied to clipboard

Add support for --disable-proxy option when installing k3s

Open georgeVasiliu opened this issue 8 months ago • 7 comments

Changed src/configuration/main.cr:

  • added disable_kube_proxy variable ( type Bool default value false);

Changed templates/master_install_script.sh:

  • added new setting KUBE_PROXY_ARGS;
  • set default value to already existing value (" --kube-proxy-arg="metrics-bind-address=0.0.0.0" );
  • conditional change into "--disable-kube-proxy" based on disable_kube_proxy option;

Only part I am unsure of is this line:

 KUBE_PROXY_ARGS=" --kube-proxy-arg=\"metrics-bind-address=0.0.0.0\" "

I am not sure whether the " inside the value need to be escaped or not (I escaped them out of habit, but can't be sure of this).

georgeVasiliu avatar Dec 22 '23 11:12 georgeVasiliu

Thank you for your work! I really like cilium and would love to use this tool with cilium. I tested your changes, but I had to modify src/kubernetes/installer.cr to make it work.

diff --git a/src/kubernetes/installer.cr b/src/kubernetes/installer.cr
index 74dffa1..9da7e17 100644
--- a/src/kubernetes/installer.cr
+++ b/src/kubernetes/installer.cr
@@ -128,6 +128,7 @@ class Kubernetes::Installer
       k3s_version: settings.k3s_version,
       k3s_token: k3s_token,
       disable_flannel: settings.disable_flannel.to_s,
+      disable_kube_proxy: settings.disable_kube_proxy.to_s,
       flannel_backend: flannel_backend,
       taint: taint,
       extra_args: extra_args,

I also had to adjust the KUBE_PROXY_ARGS, because otherwise there were problems with escaping and k3s would not start.

diff --git a/templates/master_install_script.sh b/templates/master_install_script.sh
index 0480bde..818a159 100644
--- a/templates/master_install_script.sh
+++ b/templates/master_install_script.sh
@@ -19,7 +19,7 @@ fi
 if [[ "{{ disable_kube_proxy }}" = "true" ]]; then
   KUBE_PROXY_ARGS=" --disable-kube-proxy "
 else
-  KUBE_PROXY_ARGS=" --kube-proxy-arg=\"metrics-bind-address=0.0.0.0\" "
+  KUBE_PROXY_ARGS=" --kube-proxy-arg=metrics-bind-address=0.0.0.0 "
 fi

In my cluster config I had to adjust the private_network_subnet, because otherwise it overlaps with the subnet of Cilium. But now Cilium is running with the Kubeproxy replacement

bbartsch avatar Dec 29 '23 10:12 bbartsch

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

sonarcloud[bot] avatar Dec 29 '23 10:12 sonarcloud[bot]

Thanks for your reply! I added your changes in the pull request. I will also test coming week again with this build.

For reference, here is the values.yaml file I used when installing cilium via helm:

global:
  identityAllocationMode: crd
  tunnel: vxlan
  autoDirectNodeRoutes: true
  nodeinit:
    enabled: true
  hubble:
    enabled: true
    listenAddress: ":4244"
    relay:
      enabled: true
      replicaCount: 2
    ui:
      enabled: true
      replicaCount: 2
  prometheus:
    enabled: true
  operator:
    prometheus:
      enabled: true

# Operator settings for redundancy
operator:
  replicaCount: 2
  prometheus:
    enabled: true

# Cilium DaemonSet settings for redundancy across nodes
cilium:
  # Ensuring a Cilium instance on each node
  daemonset:
    updateStrategy:
      type: RollingUpdate

# Enabling CiliumEndpointSlice for scalability in large clusters
ciliumEndpointSlice:
  enabled: true

# Enabling eBPF-based kube-proxy replacement
kubeProxyReplacement: "true"
k8sServiceHost: my.domain.com
k8sServicePort: 6443

# Enabling host firewall for additional network security
hostFirewall:
  enabled: true

# Configuring resource requests & limits for Cilium components
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "1000m"
    memory: "1024Mi"

# Enabling logstash integration for better log management (if needed)
logstash:
  enabled: false

# Configuring Cilium endpoint routes
bpf:
  mapDynamicSizeRatio: 0.0025

# Configure Cilium Network Policies
networkPolicy:
  enabled: true

# Enable L7 proxy for HTTP/HTTPS traffic
l7Proxy: true

# Enable Envoy DaemonSet
envoy:
  enabled: true

# Enable Ingress
ingressController:
  enabled: true
  default: true
  loadBalancerMode: shared

the --disable-kube-proxy is a direct requirement of using "kubeProxyReplacement: true", and that one is required if you want to use cilium's ingress instead of nginx.

I am not sure what you meant by " private_network_subnet " and the overlap? In my case, when creating the cluster I am using an " existing_network" and the " private_network_subnet" I have is set to the same value as the existing's network's CIDR. This network can created manually in the Hetzner UI or via the APIs from Hetzner. Should there be some special considerations in regards to it?

After I manage to do the tests, I can also test how to integrate cert-manager & ingress into it and maybe we can put it as another example in the templates sections.

georgeVasiliu avatar Dec 29 '23 10:12 georgeVasiliu

Cilium uses 10.0.0.0/8 as the default pod CIDR [0] and Hetzner's default subnet is 10.0.0.0/16. This led to overlaps and routing problems, and Cilium was not ready. For this reason, I have changed the Hetzner subnet to 192.168.30.0/24

[0] https://docs.cilium.io/en/stable/network/concepts/ipam/cluster-pool/#check-for-conflicting-node-cidrs

bbartsch avatar Dec 29 '23 11:12 bbartsch

I have tried to make it work but I think the whole subnet and networking stuff is not working out as intended, at least for me.

I use the following k3s config to create the cluster:

---
hetzner_token: $HETZNER_TOKEN
cluster_name: k3s-test-1
kubeconfig_path: "./kubeconfig"
k3s_version: v1.28.5+k3s1
public_ssh_key_path: "~/.ssh/key_ecdsa.pub"
private_ssh_key_path: "~/.ssh/key_ecdsa"
use_ssh_agent: false # set to true if your key has a passphrase or if SSH connections don't work or seem to hang without agent. See https://github.com/vitobotta/hetzner-k3s#limitations
ssh_port: 22
ssh_allowed_networks:
  - 0.0.0.0/0 # Loop-back over private network in order to reach the NAT gateway
api_allowed_networks:
  - 0.0.0.0/0 # Loop-back over private network in order to reach the NAT gateway
disable_flannel: true # set to true since we want to install & use Cilium instead of Flannel
disable_kube_proxy: true # set to true since we want to use Cilium as a complete replacement for kube-proxy
private_network_subnet: 10.0.0.0/12 # Set to main subnet created in Hetzner Cloud UI
schedule_workloads_on_masters: false
enable_public_net_ipv4: false # default is true
enable_public_net_ipv6: false # default is true
enable_encryption: true
existing_network: main-network # Hetzner-specific name in Cloud UI
api_server_hostname: k3s-test-1.test.eu # optional: DNS for the k8s API LoadBalancer. After the script has run, create a DNS record with the address of the API LoadBalancer.
cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.18.0/ccm-networks.yaml"
csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.5.1/deploy/kubernetes/hcloud-csi.yml"
system_upgrade_controller_manifest_url: "https://raw.githubusercontent.com/rancher/system-upgrade-controller/master/manifests/system-upgrade-controller.yaml"
masters_pool:
  instance_type: cpx21
  instance_count: 1
  location: nbg1
  labels:
    - key: node-type
      value: worker
    - key: cluster
      value: k3s-test-1
worker_node_pools:
- name: small-static
  instance_type: cpx21
  instance_count: 2
  location: nbg1
  labels:
    - key: node-type
      value: worker
    - key: cluster
      value: k3s-test-1
  additional_packages:
    - unattended-upgrades
    - update-notifier-common
  post_create_commands:
    - >
      printf "network## {config## disabled}" |
      sed 's/##/:/g' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
    - >
      printf "network##\n  version## 2\n  renderer## networkd\n  ethernets##\n    enp7s0##\n      dhcp4## true\n      nameservers##\n        addresses## [185.12.64.1, 185.12.64.2]\n      routes##\n        - to## default\n          via## 10.0.0.1" |
      sed 's/##/:/g' > /etc/netplan/50-cloud-init.yaml
    - netplan generate
    - netplan apply
    - apt update
    - apt upgrade -y
    - apt install nfs-common -y
    - apt autoremove -y
    - sudo apt install wireguard
    - sudo systemctl enable unattended-upgrades
    - sudo systemctl start unattended-upgrades
additional_packages:
  - unattended-upgrades
  - update-notifier-common
post_create_commands:
  - >
    printf "network## {config## disabled}" |
    sed 's/##/:/g' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
  - >
    printf "network##\n  version## 2\n  renderer## networkd\n  ethernets##\n    enp7s0##\n      dhcp4## true\n      nameservers##\n        addresses## [185.12.64.1, 185.12.64.2]\n      routes##\n        - to## default\n          via## 10.0.0.1" |
    sed 's/##/:/g' > /etc/netplan/50-cloud-init.yaml
  - netplan generate
  - netplan apply
  - apt update
  - apt upgrade -y
  - apt install nfs-common -y
  - apt autoremove -y
  - sudo apt install wireguard
  - sudo systemctl enable unattended-upgrades
  - sudo systemctl start unattended-upgrades

And the following cilium values.yaml file:

identityAllocationMode: crd
tunnel: disabled # Disable VXLAN and use native routing
autoDirectNodeRoutes: true # Auto directing nodes requires the following field to be specified
ipv4NativeRoutingCIDR: "10.0.0.0/8" # Adjust this to match the Hetzner Network CIDR
hubble:
  enabled: true
  relay:
    enabled: true
    replicaCount: 2
  ui:
    enabled: true
    replicaCount: 2
kubeProxyReplacement: "strict" # Explicitly enable kube-proxy replacement to strict
k8sServiceHost: k3s-test-1.test.eu
k8sServicePort: 6443

operator:
  replicaCount: 2

debug:
  enabled: true
  
daemonset:
  updateStrategy:
    type: RollingUpdate

bpf:
  masquerade: true  # Enable BPF masquerading

hostFirewall:
  enabled: true

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "1000m"
    memory: "1024Mi"

l7Proxy: true

encryption:
  enabled: true
  type: wireguard

ipam:
  mode: "cluster-pool"
  operator:
    clusterPoolIPv4PodCIDRList:
      - "10.244.0.0/16 " # Make sure to use a non-overlapping CIDR

ingressController:
  enabled: true
  loadBalancerMode: shared # Choose between "shared" or "dedicated"
  default: true # Set as default ingress controller

With this cluster & deploying cilium 1.14.5, however, I observe the following:

  • the nodes are always stuck in status NotReady;
  • the nodes always report same error when describing them:

    Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Wed, 10 Jan 2024 11:11:48 +0000   Wed, 10 Jan 2024 11:04:46 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Wed, 10 Jan 2024 11:11:48 +0000   Wed, 10 Jan 2024 11:04:46 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Wed, 10 Jan 2024 11:11:48 +0000   Wed, 10 Jan 2024 11:04:46 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Wed, 10 Jan 2024 11:11:48 +0000   Wed, 10 Jan 2024 11:04:46 +0000   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
  EtcdIsVoter      True    Wed, 10 Jan 2024 11:04:55 +0000   Wed, 10 Jan 2024 11:04:55 +0000   MemberNotLearner             Node is a voting member of the etcd cluster

  • all the pods are stuck either in pending or container creating (related to other pods, not cilium pods):
NAME                                               READY   STATUS              RESTARTS   AGE
cilium-operator-59b749dc64-5cg8v                   1/1     Running             0          45s
cilium-operator-59b749dc64-klhlb                   1/1     Running             0          45s
coredns-6799fbcd5-9249w                            0/1     Pending             0          12m
hcloud-cloud-controller-manager-85f8955f9f-vrxj8   0/1     Pending             0          11m
hcloud-csi-controller-5868f9d75c-ntpfx             0/5     Pending             0          11m
hcloud-csi-node-mdc4z                              0/3     ContainerCreating   0          11m
hcloud-csi-node-n6x24                              0/3     ContainerCreating   0          11m
hcloud-csi-node-xqj9n                              0/3     ContainerCreating   0          11m
hubble-relay-57487848bc-vbw5t                      0/1     Pending             0          45s
hubble-ui-6f48889749-ggjv7                         0/2     Pending             0          45s

  • all the cilium pods except hubble are working properly and running;
  • cilium-cli reports everything okay except for hubble of course:
Deployment             cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
Deployment             hubble-ui          Desired: 1, Unavailable: 1/1
Deployment             hubble-relay       Desired: 1, Unavailable: 1/1
DaemonSet              cilium
Containers:            hubble-ui          Pending: 1
                       hubble-relay       Pending: 1
                       cilium
                       cilium-operator    Running: 2
Cluster Pods:          0/8 managed by Cilium
Helm chart version:    1.14.5
Image versions         cilium-operator    quay.io/cilium/operator-generic:v1.14.5@sha256:303f9076bdc73b3fc32aaedee64a14f6f44c8bb08ee9e3956d443021103ebe7a: 2
                       hubble-ui          quay.io/cilium/hubble-ui:v0.12.1@sha256:9e5f81ee747866480ea1ac4630eb6975ff9227f9782b7c93919c081c33f38267: 1
                       hubble-ui          quay.io/cilium/hubble-ui-backend:v0.12.1@sha256:1f86f3400827a0451e6332262467f894eeb7caf0eb8779bd951e2caa9d027cbe: 1
                       hubble-relay       quay.io/cilium/hubble-relay:v1.14.5@sha256:dbef89f924a927043d02b40c18e417c1ea0e8f58b44523b80fef7e3652db24d4: 1
Errors:                hubble-ui          hubble-ui                        1 pods of Deployment hubble-ui are not ready
                       hubble-relay       hubble-relay                     1 pods of Deployment hubble-relay are not ready
Warnings:              hubble-ui          hubble-ui-6f48889749-ggjv7       pod is pending
                       hubble-relay       hubble-relay-57487848bc-vbw5t    pod is pending

Looking inside the master node, it seems that the k3s server is running as expected and has the "--disable-kube-proxy" added to it:

root@k3s-test-1-cpx21-master1:~# cat /etc/systemd/system/k3s.service
[Unit]
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    server \
        '--disable-cloud-controller' \
        '--disable' \
        'servicelb' \
        '--disable' \
        'traefik' \
        '--disable' \
        'local-storage' \
        '--disable' \
        'metrics-server' \
        '--write-kubeconfig-mode=644' \
        '--node-name=k3s-test-1-cpx21-master1' \
        '--cluster-cidr=10.244.0.0/16' \
        '--service-cidr=10.43.0.0/16' \
        '--cluster-dns=10.43.0.10' \
        '--etcd-expose-metrics=true' \
        '--kube-controller-manager-arg=bind-address=0.0.0.0' \
        '--disable-kube-proxy' \
        '--kube-scheduler-arg=bind-address=0.0.0.0' \
        '--node-taint' \
        'CriticalAddonsOnly=true:NoExecute' \
        '--flannel-backend=none' \
        '--disable-network-policy' \
        '--kubelet-arg=cloud-provider=external' \
        '--advertise-address=10.0.0.7' \
        '--node-ip=10.0.0.7' \
        '--node-external-ip=10.0.0.7' \
        '--cluster-init' \
        '--tls-san=10.0.0.7' \
        '--tls-san=k3s-test-1.test.eu' \
        '--tls-san=10.0.0.7' \

However, the nodes never reach a Ready status, so something must still be amiss in this configuration. If anything, I presume it's from the 10.0.0.0/8 CIDR used in the Hetzner Private Network and how it overlaps with the Cilium.

Even if that were the case, it still doesn't make a lot of sense because:

  • cilium has ipv4NativeRoutingCIDR set to 10.0.0.0/8 allowing native routing throughout all the machines in this network;
  • cilium has 10.244.0.0/16 for the ipam.operator.clusterPoolIPv4PodCIDRList field, meaning it should allocate IPs from this CIDR (which is decoupled from the main one where the other servers are present);
  • maybe worth mentioning, there ARE a couple of other servers deployed in the Hetzner Private Network, but they shouldn't overlap with anything related to cilium;

I'll keep on looking into it and see why it doesn't work with this setup.

----------- Later edit:

It seems that the troubles are caused due to the settings specified for cilium, more specifically if I disable native routing and go with vxlan tunneling then all the pods end up coming online and everything works out fine. As soon as native routing is enabled you end up in the situation before.

So while it is possible to go through with vxlan, I would rather try to understand and see why it doesn't work with native routing first though.

georgeVasiliu avatar Jan 10 '24 11:01 georgeVasiliu

In the end, I managed to get it solved and running using the native routing...however, I am not completely sure exactly what I did to fix the issue, as I have re-arranged and re-wrote everything from scratch basically, closely following the reference for helm values from the official Cilium Helm Reference Documentation. There were a few things that I noticed to be off in the documentation and your actual results when installing cilium via helm (one such thing is the "routingMode", which in the helm reference it says it can be "", "native", "tunnel", when installing cilium and viewing the diff for it, you see that in the configMap for cilium it clearly says that the possible values for "routingMode" are "vxlan", "geneve", "disabled" (??).

Anyway, here is the values.yaml I used for cilium:

autoDirectNodeRoutes: true # Advised for true in case of native routing
bpf:
  masquerade: true # without this on, cilium runs in host-legacy mode using iptables instead of eBPF for some reason (?)
debug:
  enabled: true
encryption:
  enabled: true
  type: wireguard # Requires wireguard to be preemptively installed on all nodes
hostFirewall:
  enabled: true
hubble:
  enabled: true
  relay:
    enabled: true
    replicaCount: 2
  ui:
    enabled: true
    replicaCount: 2
identityAllocationMode: crd
ipam:
  mode: "cluster-pool"
  operator:
    clusterPoolIPv4PodCIDRList:
      - "10.224.0.0/16 " # Using cluster-cidr passed to k3s by the hetnzer-k3s tool, if you had overriden cluster-cidr you need to use same CIDR here
ipv4NativeRoutingCIDR: "10.0.0.0/8" # Adjust this to match your Hetzner Private Network CIDR
kubeProxyReplacement: "true" # Explicitly enable kube-proxy replacement to true
k8sServiceHost: k3s-test-1.test.eu # Needs explicit A record in DNS Console
k8sServicePort: 6443
l7Proxy: true
loadBalancer:
  algorithm: "maglev"
maglev:
  tableSize: 16381 # Suitable for max ~160 backends per service
  hashSeed: $MAGLEV_SEED # base64-encoded 12 byte-random number
operator:
  replicaCount: 2
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "1000m"
    memory: "1024Mi"
routingMode: "native" # Best solution for no-overhead networking, requires pure L2 network for the pods though
tunnel: "disabled" # Disabled tunneling in favor of native routing via underlying CCM (in our case Hetzner CCM)

All the cilium pods and hubble and everything went online, though it did take rather some time (~5 minutes I think). And all the nodes had reported status Ready this time, with all the hcloud-related pods starting and running succesfully. Inspecting the cilium pods and running various commands revealed that the it indeed had fully replaced the kube-proxy.

Edit:

  1. Why IS Cilium actually able to run in native-routing mode with this setup? I read more about it and while indeed it requires a pure L2 network for the pods, the Private Network from Hetzner is not pure L2 since it has a gateway reserved for it at ....1 all the time. This should supposedly make it an L3 one, which Cilium shouldn't be able to interact with in native mode and would require using tunneling.
  2. Why would Cilium require explicit masquerading in order to use eBPF? I thought that since the underlying nodes had all system requirements, it would default to using that one instead of the ip tables.

georgeVasiliu avatar Jan 14 '24 07:01 georgeVasiliu

Hi, sorry for the delay. I finally have a holiday this coming week so I can spend some time on this project. Kube proxy is already disabled if you disable flannel because you want to install something else, see https://github.com/vitobotta/hetzner-k3s/blob/4995188692a5818e2b6b71265754f123efda9d99/templates/master_install_script.sh#L14.

I guess we can close this?

vitobotta avatar Apr 12 '24 10:04 vitobotta

Closing as per my last remarks. I also added native support for Cilium in https://github.com/vitobotta/hetzner-k3s/pull/348 (for the upcoming new major version v2.0.0), so you can now chose Cilium as the CNI.

vitobotta avatar Apr 28 '24 16:04 vitobotta