cluster-api-provider-aws icon indicating copy to clipboard operation
cluster-api-provider-aws copied to clipboard

EPIC: Production-level documentation

Open randomvariable opened this issue 4 years ago • 9 comments
trafficstars

/kind documentation /help

I've been going through documents for AWS Technical Baseline Reviews, and have drawn up this list of documentation that we should have to help end-users based on their checklist.

  • [ ] Typical deployment with list of all resources
  • [ ] List all deployment options (single-AZ, multi-AZ, multi-region)
  • [ ] Expected time to complete deployment
  • [ ] List skills / knowledge to complete deployment (familiarity with AWS, specific services etc...)
  • [ ] Supported environment configurations (networking, DNS etc...)
  • [ ] Architecture diagram using AWS simple icons, labelling where user data is stored
  • [ ] Network diagram showing VPCs, subnets, security groups, NACLs, and ingress/egress mappings
  • [ ] Integration points showing third-party assets (e.g. Kubernetes OCI registries)
  • [ ] Links to IAM and IAM best practice documentation
  • [ ] How to deploy without root privileges
  • [ ] Prescriptive guidance on least privilege policies
  • [ ] Clearly highlight public resources (like AMIs, clusterctl Github repos)
  • [ ] Describe purpose and location of each key (EBS root volume encryption etc....)
  • [ ] Document maintenance of AWS Secrets Manager
  • [ ] Highight where sensitive data is stored (PVCs and etcd root volumes)
  • [ ] List of all billable services, showing which are mandatory or optional
  • [ ] Guidance for EC2 instance type and size selection
  • [ ] Guidance for EBS volume type and size selection
  • [X] Step by step instructions for typical deployment architecture
  • [ ] Step-by-step deployment guide for maximising uptime and reliability
  • [ ] Prescriptive guidance for testing and troubleshooting
  • [ ] Step-by-step Instruitions on how to assess and monitor the health of the cluster and Cluster API
  • [ ] Step-by-step instructions for restoring data from a backup
  • [ ] Step-by-step instructions for recovery from instance failure
  • [ ] Step-by-step instructions for recovery from AZ failure
  • [ ] Documentation on managing AWS & K8s service limits to allow for disaster recovery
  • [ ] Documented RTO and RPOs for deployments
  • [ ] Step-by-step instructions for rotating credentials and cryptographic keys
  • [ ] Prescriptive guidance for software patches and upgrades
  • [ ] Prescriptive guidance for managing AWS service limits
  • [ ] Step-by-step instructions on handling fault conditions
  • [ ] Step-by-step instructions for recovery
  • [ ] How to use externally provisioned ASGs via third-party services for both unmanaged and EKS
  • [ ] How to run "airgapped"
  • [ ] How to bootstrap with temporary credentials
  • [ ] Diagnosing CloudFormation errors

randomvariable avatar Feb 24 '21 10:02 randomvariable

@randomvariable: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/kind documentation /help

I've been going through documents for AWS Technical Baseline Reviews, and have drawn up this list of documentation that we should have to help end-users based on their checklist.

  • [ ] Typical deployment with list of all resources
  • [ ] List all deployment options (single-AZ, multi-AZ, multi-region)
  • [ ] Expected time to complete deployment
  • [ ] List skills / knowledge to complete deployment (familiarity with AWS, specific services etc...)
  • [ ] Supported environment configurations (networking, DNS etc...)
  • [ ] Architecture diagram using AWS simple icons, labelling where user data is stored
  • [ ] Network diagram showing VPCs, subnets, security groups, NACLs, and ingress/egress mappings
  • [ ] Integration points showing third-party assets (e.g. Kubernetes OCI registries)
  • [ ] Links to IAM and IAM best practice documentation
  • [ ] How to deploy without root privileges
  • [ ] Prescriptive guidance on least privilege policies
  • [ ] Clearly highlight public resources (like AMIs, clusterctl Github repos)
  • [ ] Describe purpose and location of each key (EBS root volume encryption etc....)
  • [ ] Document maintenance of AWS Secrets Manager
  • [ ] Highight where sensitive data is stored (PVCs and etcd root volumes)
  • [ ] List of all billable services, showing which are mandatory or optional
  • [ ] Guidance for EC2 instance type and size selection
  • [ ] Guidance for EBS volume type and size selection
  • [X] Step by step instructions for typical deployment architecture
  • [ ] Step-by-step deployment guide for maximising uptime and reliability
  • [ ] Prescriptive guidance for testing and troubleshooting
  • [ ] Step-by-step Instruitions on how to assess and monitor the health of the cluster and Cluster API
  • [ ] Step-by-step instructions for restoring data from a backup
  • [ ] Step-by-step instructions for recovery from instance failure
  • [ ] Step-by-step instructions for recovery from AZ failure
  • [ ] Documentation on managing AWS & K8s service limits to allow for disaster recovery
  • [ ] Documented RTO and RPOs for deployments
  • [ ] Step-by-step instructions for rotating credentials and cryptographic keys
  • [ ] Prescriptive guidance for software patches and upgrades
  • [ ] Prescriptive guidance for managing AWS service limits
  • [ ] Step-by-step instructions on handling fault conditions
  • [ ] Step-by-step instructions for recovery
  • [ ] How to use externally provisioned ASGs via third-party services for both unmanaged and EKS
  • [ ] How to run "airgapped"
  • [ ] How to bootstrap with temporary credentials
  • [ ] Diagnosing CloudFormation errors

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 24 '21 10:02 k8s-ci-robot

Great list @randomvariable. I can help with some of this.

richardcase avatar Feb 27 '21 08:02 richardcase

Architecture diagram using AWS simple icons, labelling where user data is stored

For my part I'd also be OK with using eg https://github.com/kubernetes/community/tree/master/icons

sftim avatar May 05 '21 22:05 sftim

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 26 '21 17:09 k8s-triage-robot

/lifecycle frozen

richardcase avatar Sep 29 '21 06:09 richardcase

@sfzylad , let's chat about this at some point too.

randomvariable avatar Nov 08 '21 18:11 randomvariable

/priority important-longterm

randomvariable avatar Nov 08 '21 18:11 randomvariable

/remove-lifecycle frozen

richardcase avatar Jul 12 '22 16:07 richardcase

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 10 '22 16:10 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Nov 09 '22 17:11 k8s-triage-robot