edgedb-cli Add support for local instance backups

Make gel instance backup and gel instance restore work with local instances. Backups should be stored to some well-known location (sharedir/backups). Like with Cloud backups should be classified as automated and manual. Automated backups would be taken before certain destructive operations (e.g in-place upgrade, schema migrations etc).

The most straightforward approach would be to use pg_basebackup in combination with --incremental. We should use pg_combinebackup to squash incremental backups to only keep N most recent automated backups.

Apr 01 '25 16:04 elprans

Task list:

[x] Full/manual backup
[x] Full/manual restore (#1592)
[x] Perform incremental backups based on heuristic and allow restore
[x] Server auto-start/auto-stop for backup/restore as needed (auto-start must use a keepalive connection to prevent server shutdown)
[ ] Prune older backups and combine incrementals into full backups as needed (use a thinning heuristic)
[ ] Perform automatic backups on various operations
[ ] WSL instance support.
[ ] Docker instance support.

Apr 15 '25 16:04 mmastrac

I think we might also need to implement support for point-in-time recovery. Incremental backups might still inject non-trivial latency into dev operations. For that we'd need to archive WAL segements (i.e make the CLI be able to be set as the archive_command) and keep WAL since the last backup. Subsequent explicit backup would then clean up all prior WAL segments. This way we may be able to implement backups in an async way (i.e. not blocking). Technically we may even use WAL as a sort of lightweight backup if we note the current Postgres XID at the time of instance backup (and use that as recovery_target_xid later on).

Apr 17 '25 20:04 elprans

Interesting. I think we might be able to use the WAL for point-in-time recovery as you said.

Rather than performing this in the CLI, we could also consider shipping our own archive command binary in the portable distribution which might be more reliable given that it could potentially interrupt the server process itself if the archiving command goes off the rails.

The archive command invocation should copy and compress an archive file to a per-instance archive folder (data/instance.incremental_backups, for example) and perform some sort of size-based limiting. When restoring, we'd use the same command.

If we hit the size limit or otherwise exceed the full backup + WAL, we'll probably need to run a recovery process to slurp up the older WAL files and build a new, full backup N days later on, and then perform a basebackup from that restored database. I'm not sure if there's a way to do this with pg_combinebackup or related tools.

We should probably call txid_current() at regular intervals in CLI command invocations to track a system timestamp/txid history (AFAICT there's no timestamped history for txids). These could be stored in the same folder.

Point-in-time restoration might require us to use the unarchived WAL files that are kept around for standby purposes.

There likely needs to be a per-DB configuration for how much data we should store for backups.

Apr 17 '25 21:04 mmastrac