pghoard
pghoard copied to clipboard
Restoring from GCS takes too long
Hello,
I'm trying to restore a postgres basebackup from Google Cloud Storage to Google Compute instance in the same zone.
I cannot figure out why, but pghoard_restore
command needs ~106m to download and extract basebackup 18GB of size (48GB original size) while gsutil
takes ~3 minutes only to download the same file.
During pghoard_restore
avg network throughput is ~40MBit/s while gsutil
turns it to 1.1GBit/s
.
What am I doing wrong? How can I tune the restore process somehow?
Pghoard config:
{
"backup_location": "./metadata",
"backup_sites": {
"postgres-01": {
"active_backup_mode": "pg_receivexlog",
"basebackup_hour": 5,
"basebackup_interval_hours": 24,
"basebackup_minute": 40,
"pg_data_directory": "/var/lib/pgsql/9.5/data",
"nodes": [
{
"host": "postgres-01",
"user": "pghoard",
"password": "secret",
"port": 5432
}
],
"object_storage": {
"storage_type": "google",
"project_id": "postgres-dev",
"bucket_name": "test-pghoard"
}
}
}
}
Restore command:
pghoard_restore get-basebackup --config pghoard.json --recovery-target-time 2019-07-04T14:45:03+00:00 --restore-to-master --overwrite --target-dir /var/lib/pgsql/9.5/data/
Output:
Found 1 applicable basebackup
Basebackup Backup size Orig size Start time
---------------------------------------- ----------- ----------- --------------------
postgres-01/basebackup/2019-07-09_05-39_0 17687 MB 48779 MB 2019-07-09T05:41:38Z
metadata: {'backup-decision-time': '2019-07-09T05:39:45.442175+00:00', 'backup-reason': 'scheduled', 'normalized-backup-time': '2019-07-08T05:40:00+00:00', 'start-wal-segment': '000000040000C1120000009E', 'pg-version': '90504', 'compression-algorithm': 'snappy', 'compression-level': '0', 'original-file-size': '51148804096', 'host': 'postgres-01'}
Selecting 'postgres-01/basebackup/2019-07-09_05-39_0' for restore
2019-07-09 16:08:09,4699ChunkFetcher7687Thread-1 INFO Processing of 'postgres-01/basebackup/2019-07-09_05-39_0' completed successfully
Download progress: 100.00% (17687 / 17687 MiB)
Basebackup restoration complete.
You can start PostgreSQL by running pg_ctl -D /var/lib/pgsql/9.5/data/ start
On systemd based systems you can run systemctl start postgresql
On SYSV Init based systems you can run /etc/init.d/postgresql start
PGHoard uses single connection to download the entire file, while gsutil apparently fetches different parts of the file with separate connections. We will probably at some point be looking into utilizing multiple connections in PGHoard as well to speed up downloads of individual large files. If you're running PGHoard on the same host as the PostgreSQL server itself you could consider using the local-tar
option for basebackup_mode
, which creates multiple smaller files instead of single very large files. Those files are downloaded concurrently, resulting it much higher restore speed.
hi, any progress on this?
this issue is preventing me from using this awesome tool on production.
thanks
We have no plans to work on this since we recommend people to use local-tar backup mode in which case the parallelism already works as you'd expect.