hedera-mirror-node
hedera-mirror-node copied to clipboard
Citus automated backup
Problem
We need to ensure our Citus installation has automated backups.
Solution
- Investigate Stackgres backup functionality
- Automate backup with multi-node setup.
- Document manual restore process in
database.md
- Update existing citus.md and database.md to make sure the backup restore process is documented for the correct approach
Alternatives
the manual backup approach has been documented in citus.md
The restore side of this is currently blocked by upstream issue.
Issues / findings from testing stackgres
- pg basebackup isn't feasible for large database since every base backup takes too long to create and consumes too much storage
- using volumesnapshot as the base backup took much longer than expected, 20+ minutes for a database with less than 1GB data
- creating volumesnapshot sometimes can fail consecutively, however with different errors
- first
Failed to create snapshot content with error snapshot controller failed to update mirror-citus-coord-data-mirror-citus-coord-0 on API server: Operation cannot be fulfilled on persistentvolumeclaims \"mirror-citus-coord-data-mirror-citus-coord-0\": the object has been modified; please apply your changes to the latest version and try again
- subsequently
Error from server (AlreadyExists): error when creating "STDIN": volumesnapshots.snapshot.storage.k8s.io "manual-test-coord" already exists
- first
- Can't turn off continuous archiving which enables PITR, it's a log of WAL segments to backup under high TPS
The restore side of this is currently blocked by upstream issue.
@jnels124 Can you share the steps and exact properties of the volumes used on how to go about reproducing this issue? This doesn't seem to hit in the e2e testing for openebs/zfs-localpv.