kubeadm icon indicating copy to clipboard operation
kubeadm copied to clipboard

kubeadm: support rotation of etcd certs even if etcd is not upgraded

Open KlavsKlavsen opened this issue 7 months ago • 10 comments

What happened?

our kube-apiserver just started failing to start.. The log from containerd only says: 2023-12-16T07:35:01.426019865+01:00 stderr F }. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-12-16T06:35:01Z is after 2023-12-03T03:37:19Z"

we checked apiserver certs etc. -and they were all fine. We finally checked etcd endpoint - and that had an old cert (which for some reason was not updated with kubeadm update

kubeadm certs check-expiration had the details - but IMHO kubeadm k8s update SHOULD look at certs expiration and renew them all - instead of only renewing etcd certs IF there's an update for it.

What did you expect to happen?

a logmessage that told me which endpoint had the bad cert and details about cert.

We had updated using kubeadm and it renewed certs - but appearently it does not renew etcd certs - if there is no update of that component - which surprised us - as we thought it renewed all certs for cluster on upgrades :(

How can we reproduce it (as minimally and precisely as possible)?

update via kubeadm so etcd certs are not being renewed and "adjust OS time" - so etcd certs are expired but rest are not.

Anything else we need to know?

No response

Kubernetes version

Server Version: v1.26.4

we'll update soon (above might have been fixed in newer version?)

Cloud provider

Hetzner physical server

OS version

Ubuntu 22.04

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

KlavsKlavsen avatar Jan 05 '24 13:01 KlavsKlavsen

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 05 '24 13:01 k8s-ci-robot

/sig api-machinery

KlavsKlavsen avatar Jan 05 '24 13:01 KlavsKlavsen

/transfer kubeadm

neolit123 avatar Jan 05 '24 20:01 neolit123

kubeadm certs check-expiration had the details - but IMHO kubeadm k8s update SHOULD look at certs expiration and renew them all - instead of only renewing etcd certs IF there's an update for it.

yes, kubeadm upgrade does not rotate etcd certs if etcd is not upgraded. ~what versions of k8s are you upgrading from -> to? do they use the same etcd version?~

you could manually force renewal with kubeadm certs renew ...

EDIT: also if you upgrade more often, etcd certs will be rotated. k8s releases every 4 months and etcd versions are definitely upgraded them. etcd PATCH versions are also sometimes updated on k8s PATCH version.

neolit123 avatar Jan 05 '24 20:01 neolit123

i think this is the ~3rd time i see confusing around the kubeadm upgrade behavior of etcd and certs rotation, so i'm tagging this as a feature request.

neolit123 avatar Jan 05 '24 20:01 neolit123

our kube-apiserver just started failing to start.. The log from containerd only says: 2023-12-16T07:35:01.426019865+01:00 stderr F }. Err: connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-12-16T06:35:01Z is after 2023-12-03T03:37:19Z"

hm, i probably should not have transferred this to the kubeadm repository as the log problem is indeed an api-server / api-machinery one. you could log the k/k ticket again, but copying your description and we can repurpose this ticket here to investigate if we can rotate etcd certs on upgrade if other certs are rotated, or ... just update docs why this happens.

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

neolit123 avatar Jan 05 '24 20:01 neolit123

kubeadm: support rotation of etcd certs even if etcd is not upgraded

cc @SataQiu @pacoxu WDYT about A) rotating etcd certs every time k8s version upgrades vs B) just documenting that cert rotation for etcd is skipped if there is no etcd upgrade.

neolit123 avatar Jan 05 '24 20:01 neolit123

WDYT about A) rotating etcd certs every time k8s version upgrades vs B) just documenting that cert rotation for etcd is skipped if there is no etcd upgrade.

  1. With the default etcd certificate expiration date, if their cluster is upgraded more frequently than once a year, it will never expire.
  2. We may not take manually change etcd version into account(but it is a corner case).

A) rotating etcd certs every time k8s version upgrades +1 as this is a low cost thing.

  1. In https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#automatic-certificate-renewal,

kubeadm renews all the certificates during control plane upgrade.

+1 for A) as we mentioned this in the doc 😄

Another choice may be adding a preflight check(error or warning) for cert expire when you run upgrade if the cert will be expire in 4 months.(a release cycle duration). It can make users do the renew manually if needed.

BTW, do we have a suggestion about how many months before expire users should renew certs?

pacoxu avatar Jan 08 '24 07:01 pacoxu

With the default etcd certificate expiration date, if their cluster is upgraded more frequently than once a year, it will never expire

exactly, this tells us that @KlavsKlavsen did no upgrade their cluster on time - i.e. at least once a year. of course, there is the wider k8s problem here, that MINOR version upgrades can scare people if there are deprecations and removals of core APIs.

+1 for A) as we mentioned this in the doc 😄

i was under the impression that we have docs that etcd cert rotation is skipped if etcd upgrade is skipped. but i can't find it, so apparently not.

Another choice may be adding a preflight check(error or warning) for cert expire when you run upgrade if the cert will be expire in 4 months.(a release cycle duration). It can make users do the renew manually if needed.

it could be added as something that "upgrade plan" does. but i also vote for A), because that's the simpler UX. manual cert rotation is a bit advanced for the regular user and needs orchestration of component restarts.

BTW, do we have a suggestion about how many months before expire users should renew certs?

the kubelet also has one year client certs (by default, with 1024 byte keys, RSA), that it rotates if ~80% of the time in one year has passed - i.e. ~ 0.8 * 365 days

neolit123 avatar Jan 08 '24 07:01 neolit123

we seem to have some agreement that this is a desired change in kubeadm. i will add the help-wanted label but the change might not be so trivial.

neolit123 avatar Jan 17 '24 13:01 neolit123