Error in IPv6 environment
The hcloud csi driver does not seem to work in my IPv6 Kubernetes cluster. The cluster has been configured using kubeadm and the CNI Calico has been configured using an IPv6 pool. The routing works successfully and the container can reach the IPv4 internet via a NAT64 gateway. The only issue I currently have is with persistent storage, I wanted to use the hcloud csi driver for this. I configured the hcloud driver first with the hcloud api secret as described in the readme, afterwards I applied the files https://raw.githubusercontent.com/kubernetes/csi-api/release-1.14/pkg/crd/manifests/csidriver.yaml and https://raw.githubusercontent.com/kubernetes/csi-api/release-1.14/pkg/crd/manifests/csinodeinfo.yaml. The setup was finished off with this file: https://raw.githubusercontent.com/hetznercloud/csi-driver/v1.2.2/deploy/kubernetes/hcloud-csi.yml (I set the metric host as ":::9189" instead of the default "0.0.0.0:9189" but this does not seem the matter when deploying with the default metric host). When starting the hcloud csi driver I get hit with this error:
Notebook:~ zejar$ kubectl -n kube-system describe pods hcloud-csi-controller-0
Name: hcloud-csi-controller-0
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: worker-2/fd86:ea04:1111::12
Start Time: Thu, 09 Jan 2020 07:37:13 +0100
Labels: app=hcloud-csi-controller
controller-revision-hash=hcloud-csi-controller-7869579c4d
statefulset.kubernetes.io/pod-name=hcloud-csi-controller-0
Annotations: cni.projectcalico.org/podIP: fd00:1234::3:88f1/128
cni.projectcalico.org/podIPs: fd00:1234::3:88f1/128
Status: Running
IP: fd00:1234::3:88f1
Controlled By: StatefulSet/hcloud-csi-controller
Containers:
csi-attacher:
Container ID: docker://a87310584ca60d2813cec0f4c93af10002d16580cc6a448d2a050937be3d99f4
Image: quay.io/k8scsi/csi-attacher:v1.2.1
Image ID: docker-pullable://quay.io/k8scsi/csi-attacher@sha256:9125ce3c5c2ecfb5e17631190a3c839694b08cec172dd3da40d098a1b5eed89e
Port: <none>
Host Port: <none>
Args:
--csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
--v=5
State: Running
Started: Thu, 09 Jan 2020 07:37:15 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-fzs9x (ro)
csi-resizer:
Container ID: docker://3bece9a04d0e0be2ec11ae161053499828a735c5a2fd3522867d4fde5c03a105
Image: quay.io/k8scsi/csi-resizer:v0.3.0
Image ID: docker-pullable://quay.io/k8scsi/csi-resizer@sha256:eff2d6a215efd9450d90796265fc4d8832a54a3a098df06edae6ff3a5072b08f
Port: <none>
Host Port: <none>
Args:
--csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
--v=5
State: Running
Started: Thu, 09 Jan 2020 07:37:15 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-fzs9x (ro)
csi-provisioner:
Container ID: docker://7062801818abe94c5ae217df07d77a0627abb72059d9b1ce2a2a71155f90a4c6
Image: quay.io/k8scsi/csi-provisioner:v1.3.1
Image ID: docker-pullable://quay.io/k8scsi/csi-provisioner@sha256:d657c839dce87324fe2b677302913f9386f885f8746be7bea0ced5b0844e3433
Port: <none>
Host Port: <none>
Args:
--provisioner=csi.hetzner.cloud
--csi-address=/var/lib/csi/sockets/pluginproxy/csi.sock
--feature-gates=Topology=true
--v=5
State: Running
Started: Thu, 09 Jan 2020 07:37:47 +0100
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Thu, 09 Jan 2020 07:37:16 +0100
Finished: Thu, 09 Jan 2020 07:37:46 +0100
Ready: True
Restart Count: 1
Environment: <none>
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-fzs9x (ro)
hcloud-csi-driver:
Container ID: docker://07e30448705c4607541ffe385eb4d958d2f6c0dfd74fcfc6eb67b70d08196c78
Image: hetznercloud/hcloud-csi-driver:1.2.2
Image ID: docker-pullable://hetznercloud/hcloud-csi-driver@sha256:c17cd36fbc4223d76824e164f0238393cd21e0cc9f8710d807b532fbd7f0f480
Port: 9189/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 09 Jan 2020 07:58:47 +0100
Finished: Thu, 09 Jan 2020 07:58:47 +0100
Ready: False
Restart Count: 9
Environment:
CSI_ENDPOINT: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
METRICS_ENDPOINT: :::9189
HCLOUD_TOKEN: <set to the key 'token' in secret 'hcloud-csi'> Optional: false
Mounts:
/var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from hcloud-csi-token-fzs9x (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
socket-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
hcloud-csi-token-fzs9x:
Type: Secret (a volume populated by a Secret)
SecretName: hcloud-csi-token-fzs9x
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24m default-scheduler Successfully assigned kube-system/hcloud-csi-controller-0 to worker-2
Normal Started 24m kubelet, worker-2 Started container csi-resizer
Normal Created 24m kubelet, worker-2 Created container csi-attacher
Normal Started 24m kubelet, worker-2 Started container csi-attacher
Normal Pulled 24m kubelet, worker-2 Container image "quay.io/k8scsi/csi-resizer:v0.3.0" already present on machine
Normal Created 24m kubelet, worker-2 Created container csi-resizer
Normal Pulled 24m kubelet, worker-2 Container image "quay.io/k8scsi/csi-attacher:v1.2.1" already present on machine
Normal Created 24m kubelet, worker-2 Created container csi-provisioner
Normal Started 24m kubelet, worker-2 Started container csi-provisioner
Normal Pulling 24m (x3 over 24m) kubelet, worker-2 Pulling image "hetznercloud/hcloud-csi-driver:1.2.2"
Normal Started 24m (x3 over 24m) kubelet, worker-2 Started container hcloud-csi-driver
Normal Pulled 24m (x3 over 24m) kubelet, worker-2 Successfully pulled image "hetznercloud/hcloud-csi-driver:1.2.2"
Normal Created 24m (x3 over 24m) kubelet, worker-2 Created container hcloud-csi-driver
Normal Pulled 24m (x2 over 24m) kubelet, worker-2 Container image "quay.io/k8scsi/csi-provisioner:v1.3.1" already present on machine
Warning BackOff 4m54s (x96 over 24m) kubelet, worker-2 Back-off restarting failed container
and
Notebook:~ zejar$ kubectl -n kube-system logs -f hcloud-csi-controller-0 hcloud-csi-driver
level=debug ts=2020-01-09T06:53:36.846178303Z msg="getting instance id from metadata service"
level=error ts=2020-01-09T06:53:36.846729779Z msg="failed to get instance id from metadata service" err="Get http://169.254.169.254/2009-04-04/meta-data/instance-id: dial tcp 169.254.169.254:80: connect: network is unreachable"
My Environment
-
kubectl get nodes:
Notebook:~ zejar$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-1 Ready master 43h v1.17.0 fd86:ea04:1111::1 <none> Debian GNU/Linux 10 (buster) 4.19.0-6-amd64 docker://19.3.5
worker-1 Ready <none> 43h v1.17.0 fd86:ea04:1111::11 <none> Debian GNU/Linux 10 (buster) 4.19.0-6-amd64 docker://19.3.5
worker-2 Ready <none> 18h v1.17.0 fd86:ea04:1111::12 <none> Debian GNU/Linux 10 (buster) 4.19.0-6-amd64 docker://19.3.5
worker-3 Ready <none> 18h v1.17.0 fd86:ea04:1111::13 <none> Debian GNU/Linux 10 (buster) 4.19.0-6-amd64 docker://19.3.5
-
kubectl get pods --all-namespaces:
Notebook:~ zejar$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default test-shell-66f858c55f-8nt2z 1/1 Running 0 19h
kube-system calico-kube-controllers-648f4868b8-pkm6c 1/1 Running 28 43h
kube-system calico-node-99mvf 1/1 Running 15 43h
kube-system calico-node-m5mmn 1/1 Running 4 19h
kube-system calico-node-p4b8t 1/1 Running 0 18h
kube-system calico-node-p8bp7 1/1 Running 0 18h
kube-system coredns-6955765f44-mrrzj 1/1 Running 14 43h
kube-system coredns-6955765f44-x4pqt 1/1 Running 14 43h
kube-system etcd-master-1 1/1 Running 429 43h
kube-system hcloud-csi-controller-0 3/4 CrashLoopBackOff 11 31m
kube-system hcloud-csi-node-9wsdp 1/2 CrashLoopBackOff 10 31m
kube-system hcloud-csi-node-kjzpj 1/2 CrashLoopBackOff 10 31m
kube-system hcloud-csi-node-vqjnh 1/2 CrashLoopBackOff 10 31m
kube-system kube-apiserver-master-1 1/1 Running 383 43h
kube-system kube-controller-manager-master-1 1/1 Running 23 43h
kube-system kube-proxy-7smnt 1/1 Running 0 18h
kube-system kube-proxy-b8gz8 1/1 Running 15 43h
kube-system kube-proxy-vs2qk 1/1 Running 13 43h
kube-system kube-proxy-xnfw2 1/1 Running 0 18h
kube-system kube-scheduler-master-1 1/1 Running 23 43h
-
kubectl get services:
Notebook:~ zejar$ kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP fd00:1234::1 <none> 443/TCP 43h
kube-system hcloud-csi-controller-metrics ClusterIP fd00:1234::e722 <none> 9189/TCP 32m
kube-system hcloud-csi-node-metrics ClusterIP fd00:1234::bca9 <none> 9189/TCP 31m
kube-system kube-dns ClusterIP fd00:1234::a <none> 53/UDP,53/TCP,9153/TCP 43h
-
kubectl get sc:
Notebook:~ zejar$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
hcloud-volumes (default) csi.hetzner.cloud Delete WaitForFirstConsumer true 33m
- OS (from
/etc/os-release):
root@worker-1:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
- Kernel (from
uname -a):
root@worker-1:~# uname -a
Linux worker-1 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux
This works as intended. The metadata service (which is needed for the identification of the node) works only on IPv4.
Hello there,
I'm hitting the same issue on my IPv6-only cluster.
@LKaemmerling I don't quite get why the metadata service is IPv4 only. Couldn't it listen to IPv6 as well?
@lel-amri We might have a workaround, bypassing the metadata server all together. This will be implemented in the near future.
Hello @jooola,
Okay, thanks for the feedback on this. I'm eager to see a solution for this.
In the meantime, I'm using the following workaround:
My cluster is running K3s v1.29.2+k3s1 with Cilium 0.15.2 as the CNI. What I did was to patch the HCloud CSI to allow metadata endpoint configuration, then I have set up a DaemonSet that runs a socat instance in the host network namespace that listens to a ULA address and relay the connections to 169.254.169.254:80.
Here are the details if this is of interest for someone
Build the patched hcloud-csi Docker image:
-
Clone https://github.com/hetznercloud/csi-driver/commit/8538bfec8a750d18788356bb61e15e66c5e4a7ec
-
Build the image:
mkdir docker-build-context CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o ./docker-build-context/controller.bin ./cmd/controller/main.go CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o ./docker-build-context/node.bin ./cmd/node/main.go podman build -f ./Dockerfile -t devnull.superlel.me/hetznercloud/hcloud-csi-driver:v2.7.0-with-custom-metadata-endpoint ./docker-build-context/ -
Load the image to your nodes. I'm using
podman image save --format oci-archive -o hcloud-csi.tar devnull.superlel.me/hetznercloud/hcloud-csi-driver:v2.7.0-with-custom-metadata-endpointfollowed with an scp of hcloud-csi.tar to all the nodes, then actr image import hcloud-csi.tarfrom all the nodes.
I used the following helm chart values:
controller:
extraEnvVars:
- name: "HCLOUD_METADATA_ENDPOINT"
value: "http://[fd96:7b7a:e945:3:6d65:7461:6461:7461]:13752/hetzner/v1/metadata"
image:
hcloudCSIDriver:
name: "devnull.superlel.me/hetznercloud/hcloud-csi-driver:v2.7.0-with-custom-metadata-endpoint"
node:
extraEnvVars:
- name: "HCLOUD_METADATA_ENDPOINT"
value: "http://[fd96:7b7a:e945:3:6d65:7461:6461:7461]:13752/hetzner/v1/metadata"
image:
hcloudCSIDriver:
name: "devnull.superlel.me/hetznercloud/hcloud-csi-driver:v2.7.0-with-custom-metadata-endpoint"
Finally, I've made a Docker image that embeds socat and a script to setup the "6 -> 4 proxy" for Hetzner metadata service:
Dockerfile:
FROM docker.io/library/alpine:3.20#!/bin/sh
set -u
{ err=$(ip address add "$HETZNER_METADATA_SERVICE_PROXY64_LISTEN_ADDRESS"/128 dev cilium_host 2>&1 >&3 3>&-); } 3>&1
ret=$?
printf "%s\n" "$err" >&2
if [ $ret != 0 ] ; then
case "$err" in
*"RTNETLINK answers: File exists")
exit 0
;;
esac
fi
exit $ret
RUN apk add --no-cache socat
COPY --chmod=755 init.sh /init.sh
COPY --chmod=755 main.sh /main.sh
ENTRYPOINT ["/main.sh"]
init.sh:
#!/bin/sh
set -u
{ err=$(ip address add "$HETZNER_METADATA_SERVICE_PROXY64_LISTEN_ADDRESS"/128 dev cilium_host 2>&1 >&3 3>&-); } 3>&1
ret=$?
printf "%s\n" "$err" >&2
if [ $ret != 0 ] ; then
case "$err" in
*"RTNETLINK answers: File exists")
exit 0
;;
esac
fi
exit $ret
main.sh:
#!/bin/sh
exec socat -dd TCP6-LISTEN:"$HETZNER_METADATA_SERVICE_PROXY64_LISTEN_PORT",bind="$HETZNER_METADATA_SERVICE_PROXY64_LISTEN_ADDRESS",ipv6only=1,fork TCP4:169.254.169.254:80
I then built the image, then pushed it to the nodes. And finally I have used the following DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: hms-p64
namespace: kube-system
spec:
selector:
matchLabels:
name: hms-p64
template:
metadata:
labels:
name: hms-p64
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
initContainers:
- name: ip-address-add
env:
- name: "HETZNER_METADATA_SERVICE_PROXY64_LISTEN_ADDRESS"
value: "fd96:7b7a:e945:3:6d65:7461:6461:7461"
- name: "HETZNER_METADATA_SERVICE_PROXY64_LISTEN_PORT"
value: "13752"
image: devnull.superlel.me/hetzner-metadata-service-proxy64:latest
imagePullPolicy: Never
command:
- /init.sh
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
securityContext:
capabilities:
add:
- NET_ADMIN
containers:
- name: hms-p64
env:
- name: "HETZNER_METADATA_SERVICE_PROXY64_LISTEN_ADDRESS"
value: "fd96:7b7a:e945:3:6d65:7461:6461:7461"
- name: "HETZNER_METADATA_SERVICE_PROXY64_LISTEN_PORT"
value: "13752"
image: devnull.superlel.me/hetzner-metadata-service-proxy64:latest
imagePullPolicy: Never
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
hostNetwork: true
Any news?
I deployed a singlestack ipv6 cluster (talos) today and got the same failed to get instance id from metadata service error.
It seems dual stack is the only way atm, with private ipv4 network (required) and (at least) a ipv6 public interface - in this case it retrieves metadata via 100.64.0.0/10 (comes with eth0).
Having something like fe80::a9fe:a9fe/128 as a ipv6 counterpart to 169.254.169.254 would solve so many issues!
It would probably also eliminate the need for this cgnat hack.
Edit, forgot to mention the workaround at the time of posting - setting node.hostNetwork to true "fixes" the issue.
Curiously, controller works on pod network out of the box, but if you use the invalid token it will print the above, misleading error.
Hello there, It looks like a4c985b9ca7180383723e9d514ad9b8f46006f15 could fix the issue.
Third edit. Nevermind that, this does not fix the issue, I though this was buried deeper in the business logic. Could we re-open this issue though?
What about https://github.com/hetznercloud/csi-driver/compare/main...lel-amri:hcloud-csi:metadata-service-as-a-fallback?
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.