cloud-provider
cloud-provider copied to clipboard
Document migration steps to CCM
We should document how a user would manually migrate their clusters from using in-tree cloud providers to out-of-tree cloud provider. The documented steps can be manual or via a tool like kubeadm.
To get started, a rough outline:
- Ensure a CCM for your cloud environment is available, with roughly the same feature set than the integrated KCM provider. Determine compatibility issues (missing features, different implementation, etc).
- Prepare your cloud environment and workloads for the migration: Disable unsupported features, build a list of manual actions to be done after migration (deleting unused cloud resources, renaming, etc).
- Prepare the CCM for deployment: Write configuration files, deploy credentials, etc. Do not deploy the CCM yet
- Disable the integrated provider in KCM and kubelet: Remove flags, replace with --provider external, etc. Restart these services.
- Deploy the cloud provider.
- Ensure it synchronises with the running environment correctly, detects existing resources, deploys new resources where appropriate. Apply manual fixes where necessary.
- Test deploy new cloud resources such as LoadBalancers and Nodes.
There will be certain differences between different cloud providers, as compatibility between integrated and external cannot always be guaranteed.
I think this warrants a page in the official Kubernetes docs, @onitake are you willing to put something together?
Yes, I think I can do that. But I will need more input, and possibly some insight on the situation with different providers.
- Which features are likely to have compatibility issues? Load balancers, node labeling, launch parameters, credential injection come to mind. Others?
- How to run the different cloud providers? Should there be an example deployment for each?
- Do we need to account for environments where KCM and/or the CCM is/was running directly on a host as opposed to the k8s control plane?
- Are there dependencies on the cloud provider that need to be reconfigured? There is a cloudprovider.PVLabeler interface - how is this used? Are there some cloud providers that are also storage provisioners?
- Should monitoring topics be addressed?
And also, where should the documentation live?
- on /docs/concepts/cluster-administration/cloud-providers.md ?
- in a new page under /docs/concepts/cluster-administration/cloud-providers/ ?
- in a new page under /docs/tasks/administer-cluster/ ?
- in a new tutorial page under /docs/tutorials/clusters/ ?
How to run the different cloud providers? Should there be an example deployment for each?
I think we should stick to documenting one, AWS is probably the best example because of # of users that manage it themselves. The steps should mostly be the same across all providers as well
Do we need to account for environments where KCM and/or the CCM is/was running directly on a host as opposed to the k8s control plane?
I think we can assume control plane nodes are separate nodes
Are there dependencies on the cloud provider that need to be reconfigured? There is a cloudprovider.PVLabeler interface - how is this used? Are there some cloud providers that are also storage provisioners?
I think for the first pass, we should ignore storage providers and add CSI migration documentation iteratively.
Should monitoring topics be addressed?
No, I think just showing how to validate your CCM is working is fine
where should the documentation live
I think something like docs/tasks/administer-cluster/migrating-to-cloud-controller-manager is good.
I launched a PR, please submit corrections and input on how to migrate on AWS. I'm slightly biased towards private cloud CCM migrations, so please public cloud users: Give input on your cloud environment specifics.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
/lifecycle frozen
/cc @jiahuif
/assign @jiahuif
/cc @jpbetz