crossplane-runtime
crossplane-runtime copied to clipboard
Proposal: More pause options for disaster recovery control
Just writing down two related ideas here
What problem are you facing?
Disaster recover or migrating resources to other clusters is hard and scary
How could Crossplane help solve your problem?
During migration or disaster recovery, it will be difficult to set "pause" on all resources. It would be nice to pause a full provider, like a CLI argument --pause.
It would also be nice to have a pause option which would Observe but not Create/Update/Delete. This would give an operator confidence in what kinds of actions would run when the cluster is unpaused. This might be a different CLI option or annotation.
Another way to completely disable a provider is to set replicas to 0 in the provider's ControllerConfig
@chlunde so we have two options for disaster recovery use-cases:
- As @bobh66 mentioned, setting the replicas to 0 for the ControllerConfig.
- Setting the
pauseannotation for specific resources: https://crossplane.io/docs/v1.10/concepts/managed-resources.html#pausing-reconciliations.
Would that be sufficient for your use-cases? If not would you mind elaborating why not.
@luebken my main worry when doing use cases such as
- restoring a cluster (recreate, partial restore, go back in time for a namespace) with thousands of managed resources
- restore an external resource from backup and then restore and re-attach it to a managed resource
would be that due to some unforeseen issue:
- many resources are doubly created, for example due to generateName we get role-HASH2 when we had role-HASH1. For example if just restoring a claim and the composition rendering does not use predicatable name/external-name.
- resources are garbage collected, and then, deleted if we only restore managed resource without claims
So I would like to pause Create/Update/Delete but not Observe to ensure everything is as expected. Pause (as implemented today) would not give any comfort similar to a terraform plan, but this might do that.
Now that we have Observe only resources, I don't think this is an issue. In this case we can achieve the same thing, for example with something like this:
- Set all replicas to 0 for providers
- Update compositions to observe only https://github.com/crossplane/crossplane/issues/1722
- Restore from backup
- Enable providers by setting replicas
- Verify managed resources
- Removing observe only in compositions