cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

Consider moving etcd to its own volume

Open detiber opened this issue 5 years ago • 15 comments

/kind feature

Describe the solution you'd like Currently by default we only provision a single storage volume for instances. We should likely investigate creating a dedicated volume for etcd storage and verifying that the default configuration used is one that ensures both adequate and consistent performance characteristics.

detiber avatar Apr 23 '20 18:04 detiber

Interestingly that would also enable the ability to use EBS volume encryption at rest without having to go through the rigmarole of someone rolling their own AMI.

randomvariable avatar Apr 23 '20 19:04 randomvariable

Because this is only related to control-plane nodes, this should also be able to handle any image provided by image-builder, including ones that have been STIG partitioned, or with emptydir and containerd layers stored on a separate volume.

bagnaram avatar Apr 24 '20 18:04 bagnaram

/assign

bagnaram avatar Apr 24 '20 18:04 bagnaram

I believe this can add an additional field for infrav1.Instance for etcd volumes. It may be handy to allow for further customization in the future such as encrypted. I'm performing local tests with this configuration.

bagnaram avatar Apr 27 '20 14:04 bagnaram

It might make sense for this to be part of CAP and not provider-specific. An optional etcd volume is beneficial for each of the cloud-providers and will require hooks in the build-process in order to set up the mounts/fstab entries automatically. The implementation of the volumes can remain platform specific, since the controller will be provisioning the EBS volume based on the AWS specific parameters in the createInstance() function.

bagnaram avatar Apr 28 '20 17:04 bagnaram

I am testing the possibility of intercepting the userData creation by the AwsMachine controller. Because the default userData is stored in the KubeAdmConfig secret and is fetched by the machine.spec.bootstrap.dataSecretName, I think it would be possible to append given the case of using a customized etcd volume, but we would need to be prepended before kubeadm is called.

bagnaram avatar Apr 30 '20 16:04 bagnaram

Linked: https://github.com/kubernetes/kubeadm/issues/2127 Workaround is to manually remove lost+found inside preKubeadmCommands

bagnaram avatar May 01 '20 15:05 bagnaram

/priority important-longterm /milestone v0.5.x

vincepri avatar May 01 '20 18:05 vincepri

https://github.com/kubernetes-sigs/cluster-api/issues/2994

CecileRobertMichon avatar May 01 '20 18:05 CecileRobertMichon

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jul 30 '20 18:07 fejta-bot

/lifecycle frozen

detiber avatar Jul 30 '20 19:07 detiber

/triage accepted

sedefsavas avatar Nov 01 '21 17:11 sedefsavas

/unassign @bagnaram

If you are working on this, feel free to assign it back to yourself.

sedefsavas avatar Nov 01 '21 17:11 sedefsavas

With https://github.com/kubernetes-sigs/cluster-api/pull/3066/ merged in cluster-api, this use case is now possible, although it is not a CAPA default. To use a separate volume for etcd:

  1. Configure the additional volume in the AWSMachine (or AWSMachineTemplate used to create it) resource.
  2. Configure a mount of the volume in the KubeadmConfig resource.

dlipovetsky avatar Nov 01 '21 17:11 dlipovetsky

/remove-lifecycle frozen

richardcase avatar Jul 08 '22 22:07 richardcase

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 23 '22 19:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 22 '22 19:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 22 '22 20:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 22 '22 20:12 k8s-ci-robot