postgres-operator
postgres-operator copied to clipboard
Optimisation of the major upgrade to avoid replication problems
Have an idea to improve PGO? We'd love to hear it! We're going to need some information from you to learn more about your feature requests.
Please be sure you've done the following:
- [x] Provide a concise description of your feature request.
- [x] Describe your use case. Detail the problem you are trying to solve.
- [x] Describe how you envision that the feature would work.
- [x] Provide general information about your current PGO environment.
Overview
The procedure for upgrading the major version in conjunction with the Crunchy Operator and the Crunchy Upgrade Operator is a mega thing and really well done, many thanks for this.
Unfortunately, there is still a problem with the interaction with the replicas and the backup.
After the upgrade, in which only a "POD /PVC" is updated, no current data is available to the replica, which is correctly started with a delay. The replica therefore correctly enters the bootstrap as soon as it is started. The only problem is that this ends in a CrashLoopBackup after a short time because the PG IDs differ and the POD thus gets into trouble.
Use Case
An optimisation would solve this problem and ensure that no manual intervention is necessary after an upgrade.
Desired Behavior
My idea was to perform a stanza-upgrade via the operator after starting the primary and then to pull a full-backup -> basically almost the same (except stanza-upgrade instead of stanza-create) as when creating a new cluster. this would give you the database after the upgrade.
The second point would be the replica, either the recreation of the replica incl. PVC would be a solution, the other would be to trigger a reinit directly via patroni and thus bypass the recreation of the resources.
Environment
Tell us about your environment:
Please provide the following details:
- Platform: Openshift 4.10.23
- Platform Version: 5.1.1
- Postgres Version 13 & 14
- Storage: gp3, io2