cloud-provider icon indicating copy to clipboard operation
cloud-provider copied to clipboard

Document migration steps to CCM

Open andrewsykim opened this issue 5 years ago • 12 comments

We should document how a user would manually migrate their clusters from using in-tree cloud providers to out-of-tree cloud provider. The documented steps can be manual or via a tool like kubeadm.

andrewsykim avatar Jan 22 '20 19:01 andrewsykim

To get started, a rough outline:

  1. Ensure a CCM for your cloud environment is available, with roughly the same feature set than the integrated KCM provider. Determine compatibility issues (missing features, different implementation, etc).
  2. Prepare your cloud environment and workloads for the migration: Disable unsupported features, build a list of manual actions to be done after migration (deleting unused cloud resources, renaming, etc).
  3. Prepare the CCM for deployment: Write configuration files, deploy credentials, etc. Do not deploy the CCM yet
  4. Disable the integrated provider in KCM and kubelet: Remove flags, replace with --provider external, etc. Restart these services.
  5. Deploy the cloud provider.
  6. Ensure it synchronises with the running environment correctly, detects existing resources, deploys new resources where appropriate. Apply manual fixes where necessary.
  7. Test deploy new cloud resources such as LoadBalancers and Nodes.

There will be certain differences between different cloud providers, as compatibility between integrated and external cannot always be guaranteed.

onitake avatar Jan 22 '20 20:01 onitake

I think this warrants a page in the official Kubernetes docs, @onitake are you willing to put something together?

andrewsykim avatar Jan 22 '20 22:01 andrewsykim

Yes, I think I can do that. But I will need more input, and possibly some insight on the situation with different providers.

  • Which features are likely to have compatibility issues? Load balancers, node labeling, launch parameters, credential injection come to mind. Others?
  • How to run the different cloud providers? Should there be an example deployment for each?
  • Do we need to account for environments where KCM and/or the CCM is/was running directly on a host as opposed to the k8s control plane?
  • Are there dependencies on the cloud provider that need to be reconfigured? There is a cloudprovider.PVLabeler interface - how is this used? Are there some cloud providers that are also storage provisioners?
  • Should monitoring topics be addressed?

onitake avatar Jan 22 '20 22:01 onitake

And also, where should the documentation live?

  • on /docs/concepts/cluster-administration/cloud-providers.md ?
  • in a new page under /docs/concepts/cluster-administration/cloud-providers/ ?
  • in a new page under /docs/tasks/administer-cluster/ ?
  • in a new tutorial page under /docs/tutorials/clusters/ ?

onitake avatar Jan 22 '20 23:01 onitake

How to run the different cloud providers? Should there be an example deployment for each?

I think we should stick to documenting one, AWS is probably the best example because of # of users that manage it themselves. The steps should mostly be the same across all providers as well

Do we need to account for environments where KCM and/or the CCM is/was running directly on a host as opposed to the k8s control plane?

I think we can assume control plane nodes are separate nodes

Are there dependencies on the cloud provider that need to be reconfigured? There is a cloudprovider.PVLabeler interface - how is this used? Are there some cloud providers that are also storage provisioners?

I think for the first pass, we should ignore storage providers and add CSI migration documentation iteratively.

Should monitoring topics be addressed?

No, I think just showing how to validate your CCM is working is fine

where should the documentation live

I think something like docs/tasks/administer-cluster/migrating-to-cloud-controller-manager is good.

andrewsykim avatar Jan 23 '20 15:01 andrewsykim

I launched a PR, please submit corrections and input on how to migrate on AWS. I'm slightly biased towards private cloud CCM migrations, so please public cloud users: Give input on your cloud environment specifics.

onitake avatar Jan 24 '20 22:01 onitake

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jul 14 '20 20:07 fejta-bot

/remove-lifecycle stale

cheftako avatar Jul 15 '20 17:07 cheftako

/lifecycle frozen

cheftako avatar Jul 15 '20 17:07 cheftako

/cc @jiahuif

cheftako avatar Jul 15 '20 17:07 cheftako

/assign @jiahuif

cheftako avatar Jul 15 '20 17:07 cheftako

/cc @jpbetz

cheftako avatar Dec 21 '20 21:12 cheftako