hcloud-cloud-controller-manager
hcloud-cloud-controller-manager copied to clipboard
Pod crashes when setting HCLOUD_NETWORK and network: false
TL;DR
Despite of network: false
the hcloud-cloud-controller-manager tries to start node-route-controller
. The node-route-controller
fails due to the missing CIDR.
Expected behavior
hcloud-cloud-controller-manager starting up and configuring the nodes metadata.
Observed behavior
hcloud-cloud-controller-manager pod crashes with
E0404 08:31:13.192689 1 controllermanager.go:321] Error starting "node-route-controller"
F0404 08:31:13.192717 1 controllermanager.go:223] error running controllers: invalid CIDR[0]: <nil> (invalid CIDR address:
Minimal working example
command:
helm upgrade --install hccm \
--version 1.19.0 \
-n kube-system \
-f hccm-values.yaml \
hcloud/hcloud-cloud-controller-manager
hccm-values.yaml
:
networking:
enabled: false
network:
valueFrom:
secretKeyRef:
name: hcloud
key: network
Remark: The same happens when configuring
env:
# ...
HCLOUD_NETWORK:
valueFrom:
secretKeyRef:
name: hcloud
key: network
as described in the README.md.
Log output
Flag --allow-untagged-cloud has been deprecated, This flag is deprecated and will be removed in a future release. A cluster-id will be required on cloud instances.
I0404 08:31:09.676489 1 serving.go:348] Generated self-signed cert in-memory
W0404 08:31:09.676594 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0404 08:31:10.690398 1 metrics.go:69] Starting metrics server at :8233
I0404 08:31:13.018003 1 cloud.go:123] Hetzner Cloud k8s cloud controller v1.19.0 started
W0404 08:31:13.018036 1 main.go:75] detected a cluster without a ClusterID. A ClusterID will be required in the future. Please tag your cluster to avoid any future issues
I0404 08:31:13.018060 1 controllermanager.go:168] Version: v0.0.0-master+$Format:%H$
I0404 08:31:13.024573 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0404 08:31:13.024619 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0404 08:31:13.024657 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0404 08:31:13.024681 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0404 08:31:13.024898 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0404 08:31:13.025064 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0404 08:31:13.025905 1 secure_serving.go:213] Serving securely on [::]:10258
I0404 08:31:13.027380 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
E0404 08:31:13.051293 1 controllermanager.go:524] unable to get all supported resources from server: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery: metrics.k8s.io/v1beta1
I0404 08:31:13.051766 1 controllermanager.go:337] Started "cloud-node-controller"
I0404 08:31:13.051958 1 controllermanager.go:337] Started "cloud-node-lifecycle-controller"
I0404 08:31:13.052000 1 node_controller.go:165] Sending events to api server.
I0404 08:31:13.052081 1 node_controller.go:174] Waiting for informer caches to sync
I0404 08:31:13.052165 1 node_lifecycle_controller.go:113] Sending events to api server
I0404 08:31:13.052269 1 controllermanager.go:337] Started "service-lb-controller"
I0404 08:31:13.052355 1 controller.go:231] Starting service controller
I0404 08:31:13.052382 1 shared_informer.go:311] Waiting for caches to sync for service
I0404 08:31:13.125572 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0404 08:31:13.125587 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0404 08:31:13.125821 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
E0404 08:31:13.192689 1 controllermanager.go:321] Error starting "node-route-controller"
F0404 08:31:13.192717 1 controllermanager.go:223] error running controllers: invalid CIDR[0]: <nil> (invalid CIDR address: )
Additional information
- HelmChart version 1.19.0
- k3s version v1.29.2+k3s1 running with
--kubelet-arg="cloud-provider=external"
Without setting HCLOUD_NETWORK
the hcloud-cloud-controller-manager is unable to receive the node adress:
I0404 08:26:48.044310 1 node_controller.go:431] Initializing node k3s-controlplane1 with cloud provider
E0404 08:26:48.247486 1 node_controller.go:240] error syncing 'k3s-controlplane1': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane1" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.2, requeuing
I0404 08:26:48.247561 1 node_controller.go:431] Initializing node k3s-controlplane2 with cloud provider
E0404 08:26:48.688221 1 node_controller.go:240] error syncing 'k3s-controlplane2': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane2" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.3, requeuing
I0404 08:26:48.688270 1 node_controller.go:431] Initializing node k3s-controlplane3 with cloud provider
E0404 08:26:48.954460 1 node_controller.go:240] error syncing 'k3s-controlplane3': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane3" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.4, requeuing
And for the sake of completeness, with hccm-values.yaml
---
networking:
enabled: true
clusterCIDR: 10.42.0.0/16
network:
valueFrom:
secretKeyRef:
name: hcloud
key: network
the hcloud-cloud-controller-manager starts and adds the metadata as expected.
This is not a solution for us, since
a) we don't want the hccm to manage the routes and
b) we want to use robots: true
.
Just to clarify, you mentioned "HelmChart version 3.3.0" in the original issue. We do not have a helm chart with that version, the current version is 1.19.0
.
Sorry, that was a copy n paste error. I'm using 1.19.0
as in the helm command line.
I am unable to reproduce this with hccm 1.19.0
and the values file you provided.
While trying to reproduce I noticed that you also need to provide the k3s flag --disable-cloud-controller
, as otherwise k3s will start its own cloud-controller-manager that conflicts with hccm. You will then see these error messages:
Error getting instance metadata for node addresses: hcloud/instancesv2.InstanceMetadata: failed to convert provider id to server id: providerID does not have one of the the expected prefixes (hcloud://, hrobot://, hcloud://bm-): k3s://hetzner-k3s
I installed k3s with:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--kubelet-arg=cloud-provider=external --disable-cloud-controller" INSTALL_K3S_VERSION="v1.29.2+k3s1" sh -
Then created a secret for hccm:
kubectl create secret generic -n kube-system hcloud --from-literal=token=$HCLOUD_TOKEN --from-literal=network=hetzner-k3s
And installed the chart the same way you did with the first hccm-values.yaml
in the original description.
Could you post the output of the two following commands here?
-
kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml
-
kubectl get node k3s-controlplane1 -o yaml
My bad. I must been lost in values.
The described behaviour
E0404 11:02:01.187593 1 controllermanager.go:321] Error starting "node-route-controller"
F0404 11:02:01.187624 1 controllermanager.go:223] error running controllers: invalid CIDR[0]: <nil>
(invalid CIDR address: )
happens with the values.yaml
env:
HCLOUD_TOKEN:
valueFrom:
secretKeyRef:
name: hcloud
key: token
HCLOUD_NETWORK:
valueFrom:
secretKeyRef:
name: hcloud
key: network
networking:
enabled: false
robot:
enabled: false
Note: k3s is running with --disable-cloud-controller
.
kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: hccm
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-04T11:19:10Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: hcloud-cloud-controller-manager
namespace: kube-system
resourceVersion: "3440"
uid: 62e7b715-e99d-4878-8133-d01cd17a95be
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
spec:
containers:
- command:
- /bin/hcloud-cloud-controller-manager
- --allow-untagged-cloud
- --cloud-provider=hcloud
- --route-reconciliation-period=30s
- --webhook-secure-port=0
- --leader-elect=false
env:
- name: HCLOUD_NETWORK
valueFrom:
secretKeyRef:
key: network
name: hcloud
- name: HCLOUD_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hcloud
- name: ROBOT_PASSWORD
valueFrom:
secretKeyRef:
key: robot-password
name: hcloud
optional: true
- name: ROBOT_USER
valueFrom:
secretKeyRef:
key: robot-user
name: hcloud
optional: true
image: hetznercloud/hcloud-cloud-controller-manager:v1.19.0
imagePullPolicy: IfNotPresent
name: hcloud-cloud-controller-manager
ports:
- containerPort: 8233
name: metrics
protocol: TCP
resources:
requests:
cpu: 100m
memory: 50Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: Default
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: hcloud-cloud-controller-manager
serviceAccountName: hcloud-cloud-controller-manager
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
status:
conditions:
- lastTransitionTime: "2024-04-04T11:19:10Z"
lastUpdateTime: "2024-04-04T11:19:11Z"
message: ReplicaSet "hcloud-cloud-controller-manager-6f454fcfbf" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2024-04-04T11:19:19Z"
lastUpdateTime: "2024-04-04T11:19:19Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
kubectl get node k3s-controlplane1 -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 10.0.0.2
etcd.k3s.cattle.io/local-snapshots-timestamp: "2024-04-04T11:08:33Z"
etcd.k3s.cattle.io/node-address: 10.0.0.2
etcd.k3s.cattle.io/node-name: k3s-controlplane1-ba0bd5a4
k3s.io/node-args: '["server","--data-dir","/var/lib/rancher/k3s","--disable","traefik","--disable","servicelb","--flannel-backend","none","--disable-network-policy","--embedded-registry","true","--write-kubeconfig-mode","0600","--tls-san","lbctrl.iquestria.cso.ninja","--disable-cloud-controller","--token","********","--tls-san","k3s-controlplane1","--tls-san","10.0.0.2","--node-ip","10.0.0.2","--node-external-ip","x.x.x.x","--kubelet-arg","cloud-provider=external"]'
k3s.io/node-config-hash: QNU4YAKJZSOORINBMHYXXYIO754HSV5OGAWEWZC56NJR74RX56AQ====
k3s.io/node-env: '{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/4344eae0657f7fc0c99af34fc51358389f500f18c9bb80f5a55c130de07565d2"}'
node.alpha.kubernetes.io/ttl: "0"
p2p.k3s.cattle.io/node-address: /ip4/10.0.0.2/tcp/5001/p2p/QmWjS45ca9RZuoMnavYUhNHH4wD7V4SXVHRhzcn1tCWNdi
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2024-04-04T11:07:10Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
labels:
beta.kubernetes.io/arch: arm64
beta.kubernetes.io/os: linux
kubernetes.io/arch: arm64
kubernetes.io/hostname: k3s-controlplane1
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
p2p.k3s.cattle.io/enabled: "true"
name: k3s-controlplane1
resourceVersion: "4135"
uid: c1b6d78b-55dc-47f8-9ba0-557b81a452a7
spec:
podCIDR: 10.42.0.0/24
podCIDRs:
- 10.42.0.0/24
taints:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
status:
addresses:
- address: 10.0.0.2
type: InternalIP
- address: k3s-controlplane1
type: Hostname
allocatable:
cpu: "4"
ephemeral-storage: "55192664021"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 7934528Ki
pods: "110"
capacity:
cpu: "4"
ephemeral-storage: 56735880Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 7934528Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2024-04-04T11:10:25Z"
lastTransitionTime: "2024-04-04T11:10:25Z"
message: Cilium is running on this node
reason: CiliumIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2024-04-04T11:22:30Z"
lastTransitionTime: "2024-04-04T11:07:22Z"
message: Node is a voting member of the etcd cluster
reason: MemberNotLearner
status: "True"
type: EtcdIsVoter
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:07:10Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:07:10Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:07:10Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2024-04-04T11:20:46Z"
lastTransitionTime: "2024-04-04T11:10:20Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- quay.io/cilium/cilium@sha256:bfeb3f1034282444ae8c498dca94044df2b9c9c8e7ac678e0b43c849f0b31746
sizeBytes: 195832613
- names:
- quay.io/cilium/operator-generic@sha256:4dd8f67630f45fcaf58145eb81780b677ef62d57632d7e4442905ad3226a9088
sizeBytes: 24175419
- names:
- docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
- docker.io/rancher/mirrored-pause:3.6
sizeBytes: 253243
nodeInfo:
architecture: arm64
bootID: b44ffa8e-82e2-4740-b6ab-bf53631f8310
containerRuntimeVersion: containerd://1.7.11-k3s2
kernelVersion: 6.1.0-18-arm64
kubeProxyVersion: v1.29.2+k3s1
kubeletVersion: v1.29.2+k3s1
machineID: e7c1065f9ccd42ce8d0c10c61a494f91
operatingSystem: linux
osImage: Debian GNU/Linux 12 (bookworm)
systemUUID: 2376c8c9-a1c5-4485-8bea-efcfa76fb865
with
networking:
enabled: false
network:
valueFrom:
secretKeyRef:
name: hcloud
key: network
robot:
enabled: false
There is no env: HCLOUD_NETWORK
set:
kubectl get deployment -n kube-system hcloud-cloud-controller-manager -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: hccm
meta.helm.sh/release-namespace: kube-system
creationTimestamp: "2024-04-04T11:10:32Z"
generation: 1
labels:
app.kubernetes.io/managed-by: Helm
name: hcloud-cloud-controller-manager
namespace: kube-system
resourceVersion: "2171"
uid: e97fe5ed-db35-4eaf-a290-371b87780a2c
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 2
selector:
matchLabels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app.kubernetes.io/instance: hccm
app.kubernetes.io/name: hcloud-cloud-controller-manager
spec:
containers:
- command:
- /bin/hcloud-cloud-controller-manager
- --allow-untagged-cloud
- --cloud-provider=hcloud
- --route-reconciliation-period=30s
- --webhook-secure-port=0
- --leader-elect=false
env:
- name: HCLOUD_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hcloud
- name: ROBOT_PASSWORD
valueFrom:
secretKeyRef:
key: robot-password
name: hcloud
optional: true
- name: ROBOT_USER
valueFrom:
secretKeyRef:
key: robot-user
name: hcloud
optional: true
image: hetznercloud/hcloud-cloud-controller-manager:v1.19.0
imagePullPolicy: IfNotPresent
name: hcloud-cloud-controller-manager
ports:
- containerPort: 8233
name: metrics
protocol: TCP
resources:
requests:
cpu: 100m
memory: 50Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: Default
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: hcloud-cloud-controller-manager
serviceAccountName: hcloud-cloud-controller-manager
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2024-04-04T11:10:33Z"
lastUpdateTime: "2024-04-04T11:10:37Z"
message: ReplicaSet "hcloud-cloud-controller-manager-584f6fc4f4" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2024-04-04T11:13:22Z"
lastUpdateTime: "2024-04-04T11:13:22Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
I appreciate the help.
For the sake of completeness, without HCLOUD_NETWORK
being hccm is not able to fetch the metadata.
[...]
I0404 11:13:24.431083 1 controllermanager.go:337] Started "cloud-node-lifecycle-controller"
I0404 11:13:24.431122 1 node_lifecycle_controller.go:113] Sending events to api server
I0404 11:13:24.512098 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0404 11:13:24.512144 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0404 11:13:24.512166 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0404 11:13:24.531534 1 shared_informer.go:318] Caches are synced for service
I0404 11:13:24.531581 1 node_controller.go:431] Initializing node k3s-controlplane1 with cloud provider
E0404 11:13:24.964475 1 node_controller.go:240] error syncing 'k3s-controlplane1': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane1" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.2, requeuing
I0404 11:13:24.964549 1 node_controller.go:431] Initializing node k3s-controlplane2 with cloud provider
E0404 11:13:25.149436 1 node_controller.go:240] error syncing 'k3s-controlplane2': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane2" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.3, requeuing
I0404 11:13:25.149485 1 node_controller.go:431] Initializing node k3s-controlplane3 with cloud provider
E0404 11:13:25.317226 1 node_controller.go:240] error syncing 'k3s-controlplane3': failed to get node modifiers from cloud provider: provided node ip for node "k3s-controlplane3" is not valid: failed to get node address from cloud provider that matches ip: 10.0.0.4, requeuing
Thanks for the detailed responses :)
I can reproduce the issue with these values from your comment yesterday:
env:
HCLOUD_TOKEN:
valueFrom:
secretKeyRef:
name: hcloud
key: token
HCLOUD_NETWORK:
valueFrom:
secretKeyRef:
name: hcloud
key: network
networking:
enabled: false
robot:
enabled: false
The core issue is, that hccm & the Helm Chart always assume that users with Networks also want to use the Routing functionality. This is not always true and there are cases where you want the InternalIP
on the Node but no routes. This is not natively supported in the Helm Chart right now as you have discovered.
You can set the env variable HCLOUD_NETWORK_ROUTES_ENABLED=false
to disable just the routes controller.
These values should work (or just yours with the env variable added):
env:
HCLOUD_NETWORK_ROUTES_ENABLED:
value: "false"
networking:
enabled: true
Thank you. Will test that.
With HCLOUD_NETWORK_ROUTES_ENABLED=false
can we configure ROBOT_ENABLED=true
so that the dedicated nodes are handled by the hcloud-cloud-controller-manager, too?
Yes, should work :+1: You will have to do some magic to get the private IPs for the Robot Servers in, as that is not automatically supported in HCCM right now.