cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Cleaner separation of kubeadm and machine bootstrapping

Open randomvariable opened this issue 4 years ago • 62 comments

User Story

As a cluster operator, with development teams requiring the use of multiple operating systems, I would like a better machine bootstrapping abstraction.

Detailed Description

Cluster API Bootstrap Provider Kubeadm currently conflates two activities:

  • Generating a kubeadm configuration
  • Generating the machine bootstrapping data that eventually executes kubeadm, which only at present supports Cloud Init.

The relationship between Cluster API and machine bootstrapping has created a number of challenges:

How to secure kubeadm node joins

How to secure control plane instantiation

  • Given that to instantiate a control plane on machine boot, we need to give it key material, the CAPI Providers then have to have some mechanism to share this private data with the machine, which has become commonly known as "instance metadata".
  • Users which have read only access to the infrastructure can actually read all the private key material and reconstruct administrative access to both etcd and the API server, which represents a privilege escalation risk
  • Note that this is separate to kubeadm node joins, and is not resolved with the kubelet authentication plugin proposal
  • Some providers have a mechanism to secure the data, i.e. AWS, but these are wholly dependent on the inner workings of cloud-init
    • Specific support needs to be added for each bootstrap mechanism to every cloud provider.

How to extensibly support different bootstrappers without increasing the spaghettiness

  • Cluster API currently only supports cloud-init
  • PRs are in progress to add support for Ignition v2 (FlatCar) support to CABPK and CAPA (with CAPA's secure instance metadata support)
    • Ignition v3 still will not be supported (required for RedHat/Fedora CoreOS 4.6+ support)
    • Each bootstrapper adds complexity to CABPK and CAPA

Bootstrap reporting

It can be hard to find out what happened when bootstrapping failed. To be fair, the amount of requests for this has gone down over time due to improvements in CABPK and kubeadm, but it's still nice to have ideally.

Anything else you would like to add:

For completeness, and to avoid folk having to work through an unwieldy closed PR, I'm including the user stories and requirements in their entirety from #4221:

User Stories

IDTitleDescription
U1Non cloud-init bootstrap processes Ignition is a user-data processing Linux bootstrapping system used by Flat Car Linux, RHEL Atomic Host and Fedora CoreOS. (cluster-api/3761)
U2System preparation Although Flatcar Container Linux is being added to Image Builder, Flatcar is intended to also be used as an immutable distribution, with all additions being done at first boot. Flatcar users should be able to use standard Flatcar images with Cluster API.
U3Active Directory As a platform operator of a Windows environment, I may require their Kubernetes nodes to be domain joined such that the application workloads operate with appropriate Kerberos credentials to connect to services in the infrastructure.

For Windows or Linux hosts joining an Active Directory, they must effectively be given a set of bootstrap credentials to join the directory and persist a Kerberos keytab for the host.

U4CIS Benchmark Compliance As a platform operator, I require Kubernetes clusters to pass the CIS Benchmark in order to meet organisational level security compliance requirements.
U5DISA STIG Compliance As a platform operator in a US, UK, Canadian, Australian or New Zealand secure government environment, I require my Kubernetes clusters to be compliant with the DISA STIG.
U6Kubeadm UX As a cluster operator, I would like the bootstrap configuration of clusters or machines to be shielded from changes happening in kubeadm (e.g. v1beta1 and v1beta2 type migration)
U7Existing Clusters As a cluster operator with existing clusters, I would like to be able to, after enabling the necessary flags or feature gates, to create new clusters or machines using nodeadm.
U8Air-gapped As a cluster operator, I need Cluster API to operate independently of an internet connection in order to be able to provision clusters in an air-gapped environment, i.e. where the data center is not connected to the public internet.
U9Advanced control plane configuration files As a cluster operator, I need to configure components of my control plane, such as audit logging policies, KMS encryption, authentication webhooks to meet organisational requirements.
U10ContainerD Configuration Options such as proxy configuration, registry mirrors, custom certs, cgroup hierachy (image-builder/471) need to often be customised, and it isn’t always suitable to do at an image level. Cluster operators in an organisation often resort to prekubeadmcommand bash scripts to configure containerd and restart the service.
U11API Server Auth Reconfiguration As a cluster operator, I need to reconfigure the API server such that I can deploy a new static pod for authentication and insert an updated API server configuration.
U12Improving bootstrap reporting SRE teams often need to diagnose failed nodes, and having better information about why a node may have failed to join, or better indication of success would be helpful. (cluster-api/3716)
U13Large payloads Some vendors, and advanced cluster operators may need to drop large payloads in bootstrap configuration to do a number of tasks, such as drop CA certificates, bootstrap a network components, etc...

Cloud providers often have limited sizes for bootstrap data (e.g. AWS/Azure and vSphere)

U14External bootstrappers This is to capture the current state that Cluster API allows external bootstrappers to exist, and this should not be changed.

Requirements Specification

We define three modalities of the node bootstrapper:

ModeDescription
Provisioning Expected to run as part of machine bootstrapping e.g. (part of cloud-* SystemD units or Windows OOBE). Only supported when used with Cluster API bootstrapping. Typically executes cluster creation or node join procedures, configuring kubelet etc...
Preparation Could be run as part of machine bootstrapping prior to “provisioning”, and “prepares” a machine for use with Kubernetes. We largely keep this out of scope for the initial implementation unless there is a trivial implementation.
Post Parts of the use cases above require ongoing management of a host. We list these as requirements, but are largely not in scope for the machine bootstrapper and should be dealt with by external systems.
IDRequirementModeRelated Stories
R1 The machine bootstrapper MUST be able to execute kubeadm and report its outcome. Provisioning ProvisioningU1
R2 The machine bootstrapper MUST allow the configuration of Linux sysctl parameters PreparationU2,U4
R3 The machine bootstrapper COULD allow the application of custom static pods on the control plane ProvisioningU4,U9
R4 The machine bootstrapper MUST not directly expose the kubeadm API to the end user ProvisioningU6
R5 The machine bootstrapper MUST be able to be used in conjunction with an OS provided bootstrapping tool, not limited to Cloud-Init, Ignition, Talos and Windows Answer File. ProvisioningU1
R6 The machine bootstrapper/authenticator binary MUST provide cryptographic verification in situations where it is downloaded post-boot. PreparationU2
R7 The machine bootstrapper MUST not be reliant on the use of static pods to operate AllU5
R8 The machine bootstrapper MUST enable a Windows node to be domain joined. The machine bootstrapper WILL NOT manage the group membership of a Windows node in order to enable Group Managed Service Accounts ProvisioningU3
R9 The node bootstrapping system MUST be opt-in and not affect the operation of existing clusters when Cluster API is upgraded. ProvisioningU7
R10 The machine bootstrapper system SHOULD allow the agent to be downloaded from the management cluster PreparationU8
R11 The machine bootstrapper MUST be able to operate without connectivity to the internet (using proper configuration parameters), or to the management cluster. ProvisioningU7
R12 When the machine bootstrapper is downloaded on boot the location MUST be configurable PreparationU8
R13 When the machine bootstrapper is downloaded from the public internet, it MUST be downloadable from a location not subject to frequent rate limiting (e.g. a GCS bucket). PreparationU9
R14 The machine bootstrapper MUST be able to configure containerd given a structured configuration input.. ProvisioningU10
R15 The machine bootstrapper MUST publish a documented contract for operating system maintainers to integrate with the machine bootstrapper. AllU1
R16 The machine bootstrapper MUST support pulling payloads from a defined location outside of the cloud provider's Instance Metadata Service in order to cope with large payloads. AllU13
R17 The machine bootstrapper MUST not preclude the use of external bootstrappers as is the case today. AllU14

/kind feature

An example of the current flow for AWS is here (courtesy of @PushkarJ ) image

randomvariable avatar Sep 22 '21 15:09 randomvariable

/assign @killianmuldoon

vincepri avatar Sep 22 '21 16:09 vincepri

One excellent suggestion from the lengthy discussion in that proposal was that we should https://github.com/mozilla/sops as the encryption envelope for private key material.

randomvariable avatar Sep 22 '21 16:09 randomvariable

/cc @t-lo

CecileRobertMichon avatar Sep 22 '21 16:09 CecileRobertMichon

Just wanted to clarify what we are talking about here, is the purpose of this issue to define an interface for various providers to implement (for different OSs?) or to define a tool that will be implemented?

Reading through some of this it seemed as if this is talking about building some bootstrap binary that would allow configuration of various OSs, but I initially had assumed this would define an interface

JoelSpeed avatar Sep 23 '21 15:09 JoelSpeed

Reading through some of this it seemed as if this is talking about building some bootstrap binary that would allow configuration of various OSs, but I initially had assumed this would define an interface

I think it would be both: An interface with a default implementation.

randomvariable avatar Sep 24 '21 13:09 randomvariable

/milestone v1.0

vincepri avatar Sep 30 '21 14:09 vincepri

/kind proposal

vincepri avatar Sep 30 '21 14:09 vincepri

cc @richardcase @codablock

I've been reviewing some PRs in CAPA, namely https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/2854 and it looks like EKS has the same challenges for some areas, e.g. User Story 10, and the way it's being tackled there is to add shell scripts to the EKS image builder equivalent and then add an API in CAPA. I wonder if we should consider this as a new project and get some folk together.

randomvariable avatar Oct 21 '21 10:10 randomvariable

I've been reviewing some PRs in CAPA, namely kubernetes-sigs/cluster-api-provider-aws#2854 and it looks like EKS has the same challenges for some areas, e.g. User Story 10, and the way it's being tackled there is to add shell scripts to the EKS image builder equivalent and then add an API in CAPA. I wonder if we should consider this as a new project and get some folk together.

It does have some of the same challenges for sure. And yes it would be good to get some people together to start discussions on this.

richardcase avatar Oct 21 '21 10:10 richardcase

@randomvariable can we include a story/req here to satisfy existing ability for users to plugin their own bootstrapping mechanism? This can be achieved today two different ways: a - Pre-populating a custom bootstrap secret and setting that a machine creation time. b - Implementing a custom bootstrap provider.

enxebre avatar Oct 25 '21 11:10 enxebre

can we include a story/req here to satisfy existing ability for users to plugin their own bootstrapping mechanism? This can be achieved today two different ways.

Have added as U14 and R17 respectively. Have also captured the comment from #4172 around payload size in U13 / R16.

randomvariable avatar Nov 03 '21 22:11 randomvariable

Another use case in #3782

randomvariable avatar Nov 05 '21 14:11 randomvariable

/area bootstrap

enxebre avatar Jan 03 '22 15:01 enxebre

/milestone Next

fabriziopandini avatar Mar 03 '22 16:03 fabriziopandini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jun 01 '22 16:06 k8s-triage-robot

/remove-lifecycle stale

invidian avatar Jun 01 '22 16:06 invidian

/unassign @killianmuldoon /kind api-change /help this work really requires a proposal to be agreed before the next API bump

fabriziopandini avatar Oct 03 '22 19:10 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/unassign @killianmuldoon /kind api-change /help this work really requires a proposal to be agreed before the next API bump

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 03 '22 19:10 k8s-ci-robot

/triage accepted

fabriziopandini avatar Nov 30 '22 17:11 fabriziopandini

I volunteer to draft a proposal. Will update ASAP. /assign

johananl avatar Aug 31 '23 09:08 johananl

@johananl - a few of us discussed this at Kubecon EU earlier this year. There is some prior art from @randomvariable in the form of a proposal (which was closed) and we have started to resurect this after Kubecon. How would you feel about collaborating on the proposal which a few like minded people interested in this?

richardcase avatar Aug 31 '23 09:08 richardcase

Sure, I'm counting on collaboration here @richardcase :-) Is there a way for me to see the prior art you've mentioned? Is it recorded anywhere? I don't want to do duplicate work of course.

I'm happy to add my notes so far to an existing document. If that doesn't exist, I'm happy to share a new document where people can chime in.

johananl avatar Aug 31 '23 11:08 johananl

This is the issue with the prior art #4221 and we started a new version of it as a Google doc to start making changes (although not many changes have been added yet 😉 )

richardcase avatar Aug 31 '23 12:08 richardcase

Great, thanks. I'll go through both. Should I "plug into" the Google doc then?

What I have so far is mainly brainstorming with myself on paper with some thoughts about the design, some questions, some potential problems etc.

johananl avatar Aug 31 '23 12:08 johananl

It might be good to start adding to the doc (which is the original proposal) and then we could update it & move it forward, so we can see how it changes from the original work done by @randomvariable . But if you'd like to work another way, then that's all good.

richardcase avatar Sep 01 '23 07:09 richardcase

OK, I'll add to the doc (I can't see who the owner of the document is and didn't want to step on any toes).

johananl avatar Sep 01 '23 09:09 johananl

I'm adding a bunch of comments/suggestions about the existing state of the document. Hope it's not too much 😬

johananl avatar Sep 01 '23 10:09 johananl

Note to self and to anyone else involved: We should keep https://github.com/kubernetes-sigs/cluster-api/issues/6539 in mind in case we touch k8s object references as part of the proposed design.

johananl avatar Jan 17 '24 17:01 johananl

Hi @johanan, we are currently actively investigating the approaches based on the original CAEP and the doc @richardcase mentioned. Can you share the current state of the design, so the document will reflect it better?

Danil-Grigorev avatar Feb 13 '24 16:02 Danil-Grigorev

Hi @Danil-Grigorev. Glad to see more people are getting involved :slightly_smiling_face:

My current impression is that while the original proposal touches some important aspects of this issue, the most important concern described by @randomvariable above -- the separation of bootstrappers such as kubeadm from the provisioning tools (cloud-init, Ignition etc.) -- isn't handled in it. In its current state, the proposal sounds like we're starting from the solution (machineadm) and working our way back to the requirements rather than the other way around.

In addition, in my opinion the original proposal includes a lot of user stories, some of which don't seem directly related to the issue at hand (e.g. Active Directory domain joins) and might be better handled in separate proposals.

So, I'm not saying we shouldn't pursue the machineadm direction or that all of the user stories aren't important, I just think that adding a new binary which runs on the nodes without first solving the conflation we have in the API isn't going to lead us to where we're aiming, at least not on its own.

@richardcase what do you think about the above? Am I missing something?

In the meantime I started working on a separate design proposal which specifically addresses the conflation of bootstrap (e.g. kubeadm) and provisioning (e.g. cloud-init) because this is arguably the main thing we need to solve and I couldn't find any work around that so far. I'm happy to join efforts if there is any existing/prior work around that.

I'm still actively working on the proposal and it's by no means ready for review, but I'll share the WIP so that people can start to follow my train of thought and perhaps provide very early feedback. Here it is: https://docs.google.com/document/d/1Fz5vWwhWA-d25_QDqep0LWF6ae0DnTqd5-8k8N0vDDM/edit?usp=sharing

johananl avatar Feb 14 '24 13:02 johananl