cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

Unclear default behavior of SecureSecretsBackend

Open dkoshkin opened this issue 4 years ago • 8 comments
trafficstars

/kind bug

What steps did you take and what happened: [A clear and concise description of what the bug is.] Im still new to CAPI/CAPA so maybe I'm misunderstanding something. Is related to https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/2064

From the description it seems that by default it should use the Secrets Manager: https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/634f30eb5a3ab42ec21e358cfb86da740784aed7/api/v1alpha3/awsmachine_types.go#L167-L172

But I'm unclear on how it's being used.

  1. Deployed a cluster using default config from the quickstart.
  2. In the bootstrap cluster I see the cluster.x-k8s.io/secret secrets contain all the Kubernetes certs. (Should these still be here, I thought thats what the backend would be used for?)
  3. The AWSMachineTemplate resources don't have the default SecureSecretsBackend set.
  4. If I go to the AWS console I don't see the secret.
  5. But when I deployed into a private subnet with no NAT Gateway the instance failed to come up until I added a secretsmanager VPC endpoint for the init script.

What did you expect to happen: Either Secrets Manager is used to load Kubernetes certs by default and not passed in the cloud-init data, or if the default is to not use an AWS Secrete Manager there should not be an init script.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] I also tried to explicitly set the value, but still seeing the certs in the Secret and no secretes in the AWS Secretes Manager.

      cloudInit:
        secureSecretsBackend: secrets-manager

Environment:

CAPA v0.6.4

dkoshkin avatar Mar 18 '21 22:03 dkoshkin

You will see secrets in Kubernetes - that's an expected consequence of how Cluster API works. What we're really doing is not storing the certificates in the EC2 userdata. The problem being solved here is that if you have read only access to the EC2 console, then you can reconstruct the Kubernetes CA certificate independently of any RBAC set on the Kubernetes cluster running Cluster API.

You won't see any secrets in Secrets Manager until CAPA starts provisioning an EC2 instance, and if the VPC endpoint isn't available, then this is necessarily going to break. In addition, secrets are deleted ASAP after machine provisioning, so the lifetime is often < 1 minute.

We can clarify the documentation here though.

/kind documentation

randomvariable avatar Mar 23 '21 14:03 randomvariable

Thanks for the explanation @randomvariable! Yes I think some additional docs would be great.

dkoshkin avatar Mar 25 '21 22:03 dkoshkin

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 26 '21 17:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Oct 26 '21 18:10 k8s-triage-robot

/lifecycle frozen

randomvariable avatar Nov 08 '21 18:11 randomvariable

@dkoshkin did you find any documentation for this? Please can you explain a bit more on what exactly is added a secretsmanager VPC endpoint for the init script.

nehatomar12 avatar Apr 13 '22 15:04 nehatomar12

/remove-lifecycle frozen

richardcase avatar Jul 12 '22 16:07 richardcase

@nehatomar12

For my setup I deployed the bootstrap cluster into a private Subnet (with no InternetGateway or other routing setup to reach the Internet). When did happens the AWS APIs also become inaccessible. To get around this and have CAPA be able to reach the AWS APIs I created some VPC Endpoints.

Here is what that would look like in Terraform for secretsmanager.

resource "aws_vpc_endpoint" "secretsmanager" {
  vpc_id            = aws_vpc.my_vpc.id
  service_name      = "com.amazonaws.us-west-2.secretsmanager"
  vpc_endpoint_type = "Interface"
  security_group_ids = [aws_security_group.private.id]
  // the bastion machine will be accessing this endpoint
  subnet_ids = [aws_subnet.public.id]
  private_dns_enabled = true

  tags = var.tags
}

Similarly I created VPC endpoints for:

resource "aws_vpc_endpoint" "ec2"
resource "aws_vpc_endpoint" "elasticloadbalancing"
resource "aws_vpc_endpoint" "autoscaling"
resource "aws_vpc_endpoint" "secretsmanager"
resource "aws_vpc_endpoint" "ssm"
resource "aws_vpc_endpoint" "ssmmessages"
resource "aws_vpc_endpoint" "ec2messages"

dkoshkin avatar Jul 12 '22 16:07 dkoshkin

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 23 '22 21:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 22 '22 21:11 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Dec 22 '22 22:12 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Dec 22 '22 22:12 k8s-ci-robot