kubeadm icon indicating copy to clipboard operation
kubeadm copied to clipboard

CA rotation: controller-manager needs a separate ca.crt file

Open anguslees opened this issue 6 years ago • 33 comments

What happened?

I tried to (manually) rotate my cluster's CA key over the weekend. I discovered that /etc/kubernetes/pki/ca.crt can actually include multiple CA keys, and this is key (hah!) to rotating the CA key.

kube-controller-manager however, can only accept a single key in the file pointed to by --cluster-signing-cert-file, since this is the key used to sign things, and not to verify things (so having multiple keys doesn't make sense). kube-controller-manager exits immediately (with a helpful error) if --cluster-signing-cert-file includes multiple keys.

I think pointing kube-controller-manager --cluster-signing-cert-file to ca.crt works for the simple (single key) case, but is incorrect in general, since it prevents ca.crt file from being used to rotate keys. I think the correct path is to either:

  • Use a different file for --cluster-signing-cert-file that only contains the single "primary" CA cert. or
  • Change kube-controller-manager upstream to only use the first cert in ca.crt or some other logic to ignore additional certs.

What you expected to happen?

Able to append a new CA cert to /e/k/pki/ca.crt and have both CA certs accepted by controller jobs without other impact.

How to reproduce it (as minimally and precisely as possible)?

Append an additional cert to /e/k/pki/ca.crt and restart kube-controller-manager pod

anguslees avatar Jan 15 '19 04:01 anguslees

cc @liztio @fabriziopandini

neolit123 avatar Jan 15 '19 12:01 neolit123

@anguslees thanks for this issue and for the detailed explanation! Certification rotation IMO is a topic that deserve more attention in general, and I think that we should re-iterate at sig level on two points that up to know have no clear prioritization:

  1. certificate rotation enhancements ??? (kubelet, kubeconfig files etc.)
  2. document how certificate rotation works (automatic or DIY)

What discussed in this issue is clearly part of 1; I personally prefer the idea to change the kube-controller-manager, because this will ease the pain on users, but I'm open to reconsider this in the context of the discussion above...

fabriziopandini avatar Jan 17 '19 08:01 fabriziopandini

This is a broader topic and we should loop in @kubernetes/sig-auth-misc on approach + fixes.

timothysc avatar Jan 27 '19 22:01 timothysc

For the immediate question, the signing cert/key given to the controller manager should be distinct from the trust bundle. Coupling the two and having kube-controller-manager treat the first cert in the bundle specially seems ill-advised, since order in trust bundles is not typically significant.

liggitt avatar Jan 27 '19 22:01 liggitt

Signing certs and trust bundles are clearly different things indeed.

So, I tried to separate one from the other in https://github.com/rojkov/kubernetes/commit/afed24fecbfa32d7c939fc86a9f0361e1677824c. Cluster creation (and node join) seems to work. The algorithm for cert renewal is such that new CA certs are added to the bundle and get removed only after their expiration date. I'm not sure how to test upgrade scenarios though.

rojkov avatar Apr 24 '19 09:04 rojkov

Signing certs and trust bundles are clearly different things indeed.

Agreed. To be clear though, the signing happens with a private key, already called out separately (--cluster-signing-key-file). We just need some way to find the corresponding public key (cert). I agree the two are conceptually separate and that the signing cert might not also be in the trust bundle somewhere, but this would be odd in this situation.

(I think we should just use a separate file, as @liggitt suggests and @rojkov's change implements. Just pointing out that we could modify controller-manager to try to find the matching public key in the trust bundle if we really wanted to avoid a separate file)

anguslees avatar Apr 26 '19 07:04 anguslees

@rojkov it looks like you have already made some good changes for this issue. Are you planning to submit a PR, or interested in help testing your change?

We're interested in this feature as well and would consider working on a similar implementation and PR otherwise. Sounds like the current recommendation is to have a separate trust bundle file and keep the ca.crt file containing the signing credentials.

astrieanna avatar May 21 '19 17:05 astrieanna

@astrieanna I'm consumed by another project ATM, but I'll try to submit a PR by the end of this week.

rojkov avatar May 22 '19 14:05 rojkov

@astrieanna it seems a simple rebase is not enough now after @fabriziopandini refactored the cert renewal code. If you need to fix this issue urgently please consider submitting your own PR.

I'll get back to k8s in 2 weeks hopefully.

rojkov avatar May 23 '19 14:05 rojkov

hi, this topic needs a design proposal. deadline for KEPs is done for 1.16 and thus this has to be move to the next cycle.

neolit123 avatar Aug 03 '19 16:08 neolit123

As per kubeadm office hours discussion, we are considering certificate rotation in scope of the kubeadm operator https://github.com/kubernetes/kubeadm/issues/1698

fabriziopandini avatar Sep 16 '19 10:09 fabriziopandini

this is too late for 1.17. /milestone v1.18 also if someone has the time please create a proposal, ideally in google doc and let's discuss it. /help

neolit123 avatar Nov 09 '19 20:11 neolit123

@neolit123: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

this is too late for 1.17. /milestone v1.18 also if someone has the time please create a proposal, ideally in google doc and let's discuss it. /help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 09 '19 20:11 k8s-ci-robot

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Feb 07 '20 21:02 fejta-bot

/lifecycle frozen

neolit123 avatar Feb 07 '20 21:02 neolit123

/remove-lifecycle frozen

I created a PR to work on this, https://github.com/kubernetes/kubernetes/pull/94169 I noticed that this includes no changes to kube-controller-manager, only kubeadm. So I am confused. I thought the discussion here was rooted in a confusion in kube-controller-manager.

MikeSpreitzer avatar Aug 21 '20 21:08 MikeSpreitzer

@MikeSpreitzer like i've mentioned here, ideally before proceeding with changes in kubeadm i'd like to hear from SIG Auth what is the plan for supporting bundles for some of the flags in the KCM.

this can be done is a k/k ticket that tracks this work for sig-auth and makes it more visible.

if the response is "no, some of the flags will never support bundles" we can think about what can be done on the kubeadm side.

neolit123 avatar Aug 24 '20 12:08 neolit123

i'm fine keeping this ticket open until then, and until we clarify the wording about CA rotation as requested here: https://github.com/kubernetes/website/issues/23323

neolit123 avatar Aug 24 '20 12:08 neolit123

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Nov 22 '20 12:11 fejta-bot

/remove-lifecycle stale

anguslees avatar Nov 23 '20 12:11 anguslees

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Feb 21 '21 12:02 fejta-bot

/remove-lifecycle stale

neolit123 avatar Feb 21 '21 14:02 neolit123

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar May 22 '21 15:05 fejta-bot

/remove-lifecycle stale

SataQiu avatar May 24 '21 03:05 SataQiu

For this particular reason (ie change the pointing of the client-ca and cluster-signing-cert to different CA), I'm not able to automate the CA rotation. I thought, I can use a config file that can change the arguments kube-controller-manager accepts, but per the official doc https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/ doesn't show if KCM can accept config files.

I use command line arguments to start a KCM docker container, so changing arguments to point to different certs is not possible while rotating the CA.

Swetad90 avatar Jul 30 '21 23:07 Swetad90

this is the officially approved guide for rotating CA: https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/

it includes notes about client-ca-file and cluster-signing-cert-file.

neolit123 avatar Jul 31 '21 02:07 neolit123

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 29 '21 03:10 k8s-triage-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 21 '22 18:02 k8s-triage-robot

/unassign

fabriziopandini avatar Feb 24 '22 14:02 fabriziopandini

this is the officially approved guide for rotating CA: https://kubernetes.io/docs/tasks/tls/manual-rotation-of-ca-certificates/

it includes notes about client-ca-file and cluster-signing-cert-file.

May I ask why client-ca-file cannot be CA bundles? I think it is only related to KCM https auth.

chenk008 avatar Jan 17 '24 13:01 chenk008