hcloud-cloud-controller-manager x509: certificate is valid for 127.0.0.1, <ext. IPv4>, not 10.0.1.4 after upgrade from k3s v1.22.9 to v1.22.13 or v1.22.12

Hello. I am currently trying to get kubernetes updated to v1.22.13 from v1.22.9.

Unfortunately i am currently hitting an error and i am not 100% sure what the cause of it is. So could also be unrelated to the cloud-controller.

After Updating trying to get the log from a POD, i get this error returned:

Failed to load logs: Get "https://10.0.1.3:10250/containerLogs/kube-system/hcloud-cloud-controller-manager-ff885568d-cpgsx/hcloud-cloud-controller-manager?tailLines=502&timestamps=true": x509: certificate is valid for 127.0.0.1, <external IPv4>, not 10.0.1.3

as soon as i downgrade to v1.22.9 again, i can get the logs again.

running openssl x509 -noout -text -in /var/lib/rancher/k3s/agent/serving-kubelet.crt on the nodes gives these SAN's: DNS:k3s-control-plane-0, DNS:localhost, IP Address:127.0.0.1, IP Address:<external IPv4> (and also includes the external IPv6 when on k3s v1.22.13.)

But it does not include the mentioned internal IP 10.0.1.3 (10.0.1.3 is the IP of the Node the Pod is running on)

But that internal IP is also missing when on k3s v1.22.9. So i don't think that is the actual cause?

Anyone had any luck with updating or had a similar issue?

It is deployed using a slightly modified version of https://github.com/StarpTech/k-andy

I already updated the cloud controller to v1.13.0 but still no luck. also running k3s certificate rotate does not help.

Thanks in advance.

Sep 19 '22 00:09 lsascha

I am facing the same issue with a new k3s cluster using version v1.24.4+k3s1 and hcloud-ccm v1.13.0

Edit: Probably related to this https://github.com/k3s-io/k3s/issues/5193

Sep 20 '22 11:09 cschlesselmann

thanks for the issue link. That was already more helpful than what i found the whole time.

So as far as i understand that means either:

the hetzner-CCM needs to handle something now concerning that the internal IPs should now also bei included in the certs
we need to keep the k3s internal CCM active, even though i thought the hetzner-CCM is a full replacement (means removing the argument --disable-cloud-controller)
we need now to somehow find the internal IPs on our own and add them using --node-ip

Sep 21 '22 16:09 lsascha

we need now to somehow find the internal IPs on our own and add them using --node-ip

What i did for now is retrieving the private ip via the server metadata api before k3s installation with

NODE_IP=$(ssh -o StrictHostKeyChecking=no $USER@$SERVER_IP curl http://169.254.169.254/hetzner/v1/metadata/private-networks | grep "ip:" | cut -f 3 -d" ")

and then suppling --node-ip $NODE_IP

Sep 21 '22 18:09 cschlesselmann

thanks. did you set any other arguments? Because i set it for my control nodes, but they don't start anymore with that argument set. They seem to keep trying to elect a new leader without success.

I only added '--node-ip=10.0.1.2' \ etc. to the /etc/systemd/system/k3s.service file. I can also see the IP being added to the cert, but its now missing the external IP.

So i also tried to set '--node-external-ip=<External-IPv4> \ After that, the certificate looks fine, but k3s service still not starting.

Full entry in the service file looks like this:

ExecStart=/usr/local/bin/k3s \
    server \
        '--cluster-init' \
        '--disable' \
        'local-storage' \
        '--disable-cloud-controller' \
        '--disable' \
        'traefik' \
        '--disable' \
        'servicelb' \
        '--kubelet-arg=cloud-provider=external' \
        '--kube-apiserver-arg=service-node-port-range=1-65535' \
        '--node-ip=10.0.1.2' \
        '--node-external-ip=<External-IPv4>' \

Sep 21 '22 18:09 lsascha

I install my k3s server with k3sup My full script for the control plane node (i don't use ha/cluster) looks like this:

#!/bin/bash

if ! command -v k3sup &> /dev/null
then
    echo "k3sup could not be found"
    exit 1
fi

K3S_VERSION="v1.24.4+k3s1"
SERVER_IP=""
USER="root"

echo "##################################"
echo "###### Installing Server #########"
echo "##################################"

NODE_IP=$(ssh -o StrictHostKeyChecking=no $USER@$SERVER_IP curl http://169.254.169.254/hetzner/v1/metadata/private-networks | grep "ip:" | cut -f 3 -d" ")

# Setup server
k3sup install \
  --ip $SERVER_IP \
  --user $USER \
  --k3s-version "$K3S_VERSION" \
  --k3s-extra-args "--disable servicelb,traefik,local-storage --node-ip $NODE_IP --node-taint CriticalAddonsOnly=true:NoExecute --flannel-backend=none --disable-network-policy --disable-cloud-controller --kubelet-arg cloud-provider=external"

Sep 21 '22 18:09 cschlesselmann

i don't see much difference. don't know what you mean with i don't use ha/cluster. do you mean you are running only a single server cluster?

I also found --tls-san which according to its description, adds the IP to the cert. But it does not add anything adn i still get the error when trying to access pod logs.

i read that --tls-san apparently only adds it on request, so if something tries to access it, its added. But apparently that also does not work, or else accessing logs should work. right?

So i am still lost here. I don't think downgrading is the best option (but currently my only one).

Sep 21 '22 20:09 lsascha

I now removed --disable-cloud-controller on the control plane nodes. even though the readme of this repo says to set this argument.

Now i can access the logs again on the Pods. But i am not sure about other issues that might follow with basically 2 cloud controllers now running?

Sep 23 '22 00:09 lsascha

just as an update: using that argument helped in being able to get it working. But only most of the time.

It results in sometimes still giving the cert error. Probably because of both controllers fighting.

So issue is in my opinion still open

Oct 13 '22 16:10 lsascha

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

Dec 24 '22 12:12 github-actions[bot]

This is not complete and still needs some form of solution.

Dec 29 '22 16:12 lsascha

@lsascha maybe this will help you https://github.com/k3s-io/k3s/issues/6679

I am also running a fork of k-andy and ran into this. The root issue was that really all ips have to given to k3s with args.

Luckily the hcloud terraform provider now has this primary_ip resource so they can be created before launching the control-plane servers so the external-ip can be known before server launch.

It's probably not a bug in the hcloud-cloud-controller-manager

Jan 05 '23 10:01 toabi

Hey @lsascha,

sorry that this was autoclosed without a response from Hetzner Cloud.

This looks like a configuration error with k3s, and not related to the hcloud-cloud-controller-manager. We are not responsible for generating the certificates used by the k3s apiserver.

If you still think this is a bug in hcloud-cloud-controller-manager, please explain where you expect it to act differently.

Jan 09 '23 12:01 apricote

@apricote Thanks. But i am not so sure about that its not a bug of hcloud-cloud-controller-manager.

Yes its a new introduced issue in combination with v1.22.10+k3s1 But as far as i have understood, the controller is supposed to provide all neccessary IPs so the certificates are valid. Which seems its not doing for all IP's anymore?

But i could be wrong.

Also i would favor in not needing to recreate nodes just to maybe fix it, like @toabi suggests.

Also since i tried it with keeping the default controller enabled, it works more often than not, but since two controllers are fighting each other, it always breaks randomly. So there is definetly a behaviour that is missing in the hetzner controller compared to the default controller.

Jan 13 '23 12:01 lsascha

I migrated three k3s k-andy-fork created clusters to define all the internal and external IPs when launching k3s servers and nodes and they work now without issues. It definitely works.

My guess would be that the built-in cloud controller is part of the startup process of k3s and therefore all IP-adresses are known when it's time to generate the certificates.

The hcloud controller starts after the k3s startup in which the certificate gets generated and therefore k3s can't know about the internal IP addresses which are not necessarily assigned already on boot/startup time.

Jan 13 '23 13:01 toabi

To generate the kubelet serving cert, k3s needs to know all IPs at startup time. That is why you need to provide the IPs through the arguments --node-ip and --node-external-ip to k3s.

If you then deploy the hcloud-cloud-controller-manager with network support, it will confirm the available IPs of the node and the ExternalIP is added to Node.status.addresses.

I was able to successfully start a k3s server with v1.22.13+k3s1 using these commands, following your example (https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/316#issuecomment-1254077802):

hcloud network create --name k3s-private --ip-range 10.0.0.0/24
hcloud network add-subnet k3s-private --ip-range 10.0.0.0/24 --network-zone eu-central --type cloud
hcloud server create --name k3s-server --location hel1 --type cpx31 --ssh-key julian.toelle --image ubuntu-22.04 --network k3s-private
hcloud server ssh k3s-server
# Now on k3s-server
# Download k3s
curl -L -o k3s https://github.com/k3s-io/k3s/releases/download/v1.22.13%2Bk3s1/k3s
install k3s /usr/bin/k3s
# Start server
k3s server \
  --cluster-init \
  --disable local-storage,traefik,servicelb \
  --disable-cloud-controller \
  --kubelet-arg=cloud-provider=external \
  --node-ip 10.0.0.2 \
  --node-external-ip 65.109.183.203

# In a new terminal
hcloud server ssh k3s-server
k3s kubectl -n kube-system create secret generic hcloud --from-literal=token=<TOKEN> --from-literal=network=k3s-private
k3s kubectl apply -f  https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/latest/download/ccm-networks.yaml

But I was already able to retrieve logs from kubelet before the hccm was deployed. I am still convinced this is a configuration issue of k3s. Please provide exact steps for me to follow to replicate your issue @lsascha

Jan 13 '23 13:01 apricote

hcloud-cloud-controller-manager hcloud-cloud-controller-manager copied to clipboard

x509: certificate is valid for 127.0.0.1, <ext. IPv4>, not 10.0.1.4 after upgrade from k3s v1.22.9 to v1.22.13 or v1.22.12

hcloud-cloud-controller-manager
hcloud-cloud-controller-manager copied to clipboard