vcluster icon indicating copy to clipboard operation
vcluster copied to clipboard

Document how migrate vcluster to another host cluster

Open olljanat opened this issue 2 years ago • 2 comments

As Kubernetes supports multiple different kind of network configurations and as it is evolving quite fast it is possible scenario that users might need to be able to migrate vcluster(s) from host cluster to another.

Example scenarios which why that would be needed:

  • Migration to another host cluster which uses different CNI plugin.
  • Migration from IPv4 only cluster to IPv4+IPv6 dual stack cluster.
  • Enable new Kubernetes version capabilities to single vcluster when others are not ready to upgrade.
  • Split host cluster to multiple when it gets too big.

Key points which I would like to see on documentation:

  • What should be taken care of on initial setup to simplify this one?
    • How migrate persistent storage?
    • Networking requirements (CNI settings, service/cluster CIDRs, etc...)
    • Service discovery
  • How it is possible to do migration with minimal downtime for applications.

PS. I will most likely build up lab to study this topic and can comment back to here but any tips, etc would be useful.

olljanat avatar Nov 23 '21 10:11 olljanat

@olljanat thanks for creating this issue! You are correct, we need more extensive documentation in this area

FabianKramm avatar Nov 23 '21 11:11 FabianKramm

Writing out some thoughts which I have got so far.

Option 1 - Disaster recovery

On small environment and environments in general where maintenance breaks are allowed it is most probably easiest to do disaster recovery using tools like Kasten https://www.kasten.io/kubernetes/use-cases/application-mobility/

Option 2 - Use Longhorn as CSI

Longhorn can be used to replicate data between clusters which make possible to use it for migrations too like described on https://vitobotta.com/2020/10/25/kubernetes-live-migration/ That why it might make sense to use Longhorn as CSI plugin on those environments where it is possible (it can be set to keep only 1 replica of data if there is no need for multiple).

Option 3 - Zero downtime migration

On business critical environments where maintenance breaks are not option and/or where environment is just too big to be migrated on short maintenance breaks this need more preparations in first place (which I why opened this issue).

Because vcluster makes simple to run as many clusters as needed it make sense to do split between stateless and stateful on day 1 and use some policy engine to make sure that volumes are not created to stateless clusters: usecase_stateless_split ( Picture borrowed from: https://cilium.io/blog/2019/03/12/clustermesh )

Multi-cluster setup of course means that there is new issue with service discovery which need to be handled somehow. Either with service mesh (or mess like I call it), with cluster mesh/global services like Cilium does it or if/when target is to be able to change also CNI in future then probably with something like IP pool per namespace (vcluster) together with DNS based service discovery.

Last one in theory would be most future proof especially when done with IPv6 only and when own sub domain is reserved for each vcluster and suffixes for current and future clusters are ready made on day one (one of my proposals how to do it)

If we use Guestbook as example we can prepare two dev clusters for it like this:

  • Active cluster:
    • Name: guestbook-dev-a
    • PodCIDRs: 10.10.11.0/24
    • DNS domain: guestbook.dev-a.local
    • DNS search list: guestbook.dev-a.local guestbook.dev-b.local
  • Placeholder for future:
    • Name: guestbook-dev-b
    • PodCIDRs: 10.10.12.0/24
    • DNS domain: guestbook.dev-b.local
    • DNS search list: guestbook.dev-b.local guestbook.dev-a.local

and create network policy which contains both of those PodCIDRs,

Then we can migrate service by service to new cluster by just adding new replicas to there and remove them from local cluster. Totally without service breaks (assuming that application architecture supports it).

olljanat avatar Nov 25 '21 11:11 olljanat