cluster-api-provider-aws Clear out sensitive files after successful bootstrap

trafficstars

/kind feature

We keep around some files around that contain sensitive information that should ideally be removed when all conditions are met for proper cluster creation. I would like to have a high level of confidence that everything from those files has properly been used, and then the file can be safely deleted without impacting future troubleshooting or remediation.

Describe the solution you'd like Delete sensitive files that are no longer used to reduce the protected locations on the given VM.

Anything else you would like to add: Some of this might require changes to Cluster API and not just Cluster API Provider AWS.

Environment:

Cluster-api-provider-aws version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

Jan 14 '21 15:01 voor

Partially CAPI, partly deleting /etc/cloud-init-secret.yaml from secrets manager script created file.

Mar 11 '21 19:03 randomvariable

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 26 '21 17:09 k8s-triage-robot

This can be remediated with a post kubeadm script, but it's still a good thing for security clean up

Sep 27 '21 00:09 voor

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Oct 27 '21 00:10 k8s-triage-robot

/triage accepted /priority important-longterm

Nov 08 '21 19:11 randomvariable

/lifecycle frozen

Nov 08 '21 19:11 randomvariable

/help

Need to understand cloud-init a bit if you're going to pick it up. Do reach out.

Nov 08 '21 19:11 randomvariable

@randomvariable: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Need to understand cloud-init a bit if you're going to pick it up. Do reach out.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Nov 08 '21 19:11 k8s-ci-robot

I will write an ADR for this with @shysank to explain how AWS Systems Manager would be a good fit for this clean up. https://docs.aws.amazon.com/systems-manager/latest/userguide/execute-remote-commands.html

CAPZ has a similar mechanism to run commands in VMs post deployment, called VM extensions. They are not using this mechanism to clean up sensitive data yet, just using it for setting conditions looking at the existence of sentinel file /run/cluster-api/bootstrap-success.complete

Relevant issues: https://github.com/kubernetes-sigs/cluster-api/issues/1739 https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/582 https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/915

Some locations to check for sensitive info (haven't confirmed the accuracy of below list yet) /var/lib/cloud/instances/i-002dcd6cf5a525f11/cloud-config.txt /run/kubeadm/kubeadm-join-config.yaml /tmp/kubeadm.yaml var/log/cloud-init-logs

/assign

Mar 10 '22 20:03 sedefsavas

cc @srm09 for CAPV

Mar 10 '22 20:03 sedefsavas

Just make sure it's toggle-able, since AWS still operates a few locations without AWS Systems Manager

Mar 10 '22 20:03 voor

Thanks for the early feedback @voor. Was thinking once this is in, we could benefit from using AWS Systems Manager for collecting more info from the instances in the future.

Then, postKubeadmConfig script might be a solution that'd work everywhere.

Mar 10 '22 20:03 sedefsavas

Looking at this list, looks like it exists in all regions: https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/

Mar 10 '22 20:03 sedefsavas

Since AWS Systems Manager support does not exist at every region, we will go with PostKubeadmCommands. Not really need a design.

/unassign

Mar 24 '22 17:03 sedefsavas

/assign

Mar 24 '22 17:03 Ankitasw

/remove-lifecycle frozen

Jul 12 '22 16:07 richardcase

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 10 '22 16:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Nov 09 '22 17:11 k8s-triage-robot

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Jan 19 '24 04:01 k8s-triage-robot

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

Clear out sensitive files after successful bootstrap

Guidelines

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard