hcloud-cloud-controller-manager icon indicating copy to clipboard operation
hcloud-cloud-controller-manager copied to clipboard

Robot server with cloud load balancer over vswitch| hcloud-ccm CrashLoopBackOff

Open caynev opened this issue 5 months ago • 2 comments

TL;DR

We are trying to setup a K3s cluster on dedicated hetzner servers (AX42) with cloud load balancer support over the vswitch.

Expected behavior

A running hcloud-cloud-controller-manager with lb support over vswitch.

Observed behavior

As soon as we add the following environment variables from your How to attach load balancers to Robot private IPs documentation the ccm pod crashes with the message:

│ F0513 15:12:26.452494       1 main.go:62] Cloud provider could not be initialized: hcloud/newCloud: checking if server is in Network not possible: serverIsAttachedToNetwork: Get "http://169.254.169.254/hetzner/v1/metadata/private-networks": context deadline exceeded (Client.Timeout exceeded while awaiting h │

helm-values.yaml

env:
  HCLOUD_NETWORK:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

  HCLOUD_NETWORK_ROUTES_ENABLED:
    value: "false"

Minimal working example

k3s server (version v1.32.4+k3s1 ) with the following config:

  node-ip: "{{ ansible_facts['enp6s0.4000']['ipv4']['address'] }}"
  flannel-backend: 'none'
  protect-kernel-defaults: true
  secrets-encryption: true
  kube-apiserver-arg:
    - 'audit-log-path=/var/lib/rancher/k3s/server/logs/audit.log'
    - 'audit-policy-file=/var/lib/rancher/k3s/server/audit.yaml'
    - 'audit-log-maxage=30'
    - 'audit-log-maxbackup=10'
    - 'audit-log-maxsize=100'
    - 'enable-admission-plugins=NodeRestriction,EventRateLimit'
    - 'admission-control-config-file=/var/lib/rancher/k3s/server/psa.yaml'
  kubelet-arg:
    - 'cloud-provider=external'
    - 'streaming-connection-idle-timeout=5m'
    - "tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,\
       TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,\
       TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,\
       TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,\
       TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,\
       TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305"
  kube-controller-manager-arg: 'terminated-pod-gc-threshold=10'
  disable-network-policy: true
  disable-kube-proxy: true
  disable-cloud-controller: true
  disable:
    - 'servicelb'
    - 'traefik'
    - 'local-storage'

Cilium helm chart (version 1.17.3)

cilium-helm-values.yaml

operator:
  rollOutPods: true
  nodeSelector:
    'node-role.kubernetes.io/control-plane': "true"

hostFirewall:
  enabled: false

rollOutCiliumPods: true

encryption: {
  enabled: true
  type: 'wireguard'
  nodeEncryption: true

k8sServiceHost: "127.0.0.1"
k8sServicePort: 6444

kubeProxyReplacement: true
kubeProxyReplacementHealthzBindAddr: "0.0.0.0:10256"

k8sClientRateLimit:
  qps: 50
  burst: 200

hcloud ccm (version 1.24.0)

hcloud-ccm-helm-values.yaml

robot:
  enabled: true

networking:
  enabled: false

env:
  HCLOUD_NETWORK:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: network

  HCLOUD_NETWORK_ROUTES_ENABLED:
    value: "false"

  HCLOUD_TOKEN:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: token

  ROBOT_USER:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: robot-user
        optional: true

  ROBOT_PASSWORD:
    valueFrom:
      secretKeyRef:
        name: hcloud
        key: robot-password
        optional: true

Log output

Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0513 15:26:23.546492       1 serving.go:386] Generated self-signed cert in-memory
W0513 15:26:23.546549       1 client_config.go:667] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0513 15:26:23.738562       1 metrics.go:67] Starting metrics server at :8233
F0513 15:26:28.871713       1 main.go:62] Cloud provider could not be initialized: hcloud/newCloud: checking if server is in Network not possible: serverIsAttachedToNetwork: Get "http://169.254.169.254/hetzner/v1/metadata/private-networks": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Additional information

If we set the following environment variable that we found in the golang code, the hcloud-ccm starts.

env:
  HCLOUD_NETWORK_DISABLE_ATTACHED_CHECK:
    value: "true"

If this is expected behavior, then it would be nice if this would be included in the documentation.

caynev avatar May 13 '25 15:05 caynev