pghoard icon indicating copy to clipboard operation
pghoard copied to clipboard

Restoring from GCS takes too long

Open ilicmilan opened this issue 5 years ago • 3 comments

Hello,

I'm trying to restore a postgres basebackup from Google Cloud Storage to Google Compute instance in the same zone. I cannot figure out why, but pghoard_restore command needs ~106m to download and extract basebackup 18GB of size (48GB original size) while gsutil takes ~3 minutes only to download the same file.

During pghoard_restore avg network throughput is ~40MBit/s while gsutil turns it to 1.1GBit/s. What am I doing wrong? How can I tune the restore process somehow?

Pghoard config:

{
    "backup_location": "./metadata",
    "backup_sites": {
        "postgres-01": {
            "active_backup_mode": "pg_receivexlog",
            "basebackup_hour": 5,
            "basebackup_interval_hours": 24,
            "basebackup_minute": 40,
            "pg_data_directory": "/var/lib/pgsql/9.5/data",
            "nodes": [
                {
                    "host": "postgres-01",
                    "user": "pghoard",
                    "password": "secret",
                    "port": 5432
                }
            ],
            "object_storage": {
                "storage_type": "google",
                "project_id": "postgres-dev",
                "bucket_name": "test-pghoard"
            }
        }
    }
}

Restore command:

pghoard_restore get-basebackup --config pghoard.json --recovery-target-time 2019-07-04T14:45:03+00:00 --restore-to-master --overwrite --target-dir /var/lib/pgsql/9.5/data/

Output:

Found 1 applicable basebackup 

Basebackup                                Backup size    Orig size  Start time          
----------------------------------------  -----------  -----------  --------------------
postgres-01/basebackup/2019-07-09_05-39_0     17687 MB     48779 MB  2019-07-09T05:41:38Z
    metadata: {'backup-decision-time': '2019-07-09T05:39:45.442175+00:00', 'backup-reason': 'scheduled', 'normalized-backup-time': '2019-07-08T05:40:00+00:00', 'start-wal-segment': '000000040000C1120000009E', 'pg-version': '90504', 'compression-algorithm': 'snappy', 'compression-level': '0', 'original-file-size': '51148804096', 'host': 'postgres-01'}

Selecting 'postgres-01/basebackup/2019-07-09_05-39_0' for restore
2019-07-09 16:08:09,4699ChunkFetcher7687Thread-1        INFO    Processing of 'postgres-01/basebackup/2019-07-09_05-39_0' completed successfully
Download progress: 100.00% (17687 / 17687 MiB)
Basebackup restoration complete.
You can start PostgreSQL by running pg_ctl -D /var/lib/pgsql/9.5/data/ start
On systemd based systems you can run systemctl start postgresql
On SYSV Init based systems you can run /etc/init.d/postgresql start

ilicmilan avatar Jul 09 '19 14:07 ilicmilan

PGHoard uses single connection to download the entire file, while gsutil apparently fetches different parts of the file with separate connections. We will probably at some point be looking into utilizing multiple connections in PGHoard as well to speed up downloads of individual large files. If you're running PGHoard on the same host as the PostgreSQL server itself you could consider using the local-tar option for basebackup_mode, which creates multiple smaller files instead of single very large files. Those files are downloaded concurrently, resulting it much higher restore speed.

rikonen avatar Aug 05 '19 06:08 rikonen

hi, any progress on this?

this issue is preventing me from using this awesome tool on production.

thanks

k1ng440 avatar Sep 08 '19 12:09 k1ng440

We have no plans to work on this since we recommend people to use local-tar backup mode in which case the parallelism already works as you'd expect.

Ormod avatar Sep 08 '19 17:09 Ormod