pghoard
pghoard copied to clipboard
Download of basebackup always stalls
Hello,
I'm facing the issue where I'm not able to download a basebackup using pghoard_restore
command since download always stalls.
Restore command:
sudo -u postgres pghoard_restore get-basebackup --config pghoard.json --restore-to-master --overwrite --target-dir /var/lib/pgsql/9.5/data-new/
The appropriate backup was selected, but nothing happens.
ps auxf
shows that pghoard_restore
creates 9 additional processes but the download progress is constantly 0%, which after 3 x 2 minutes turns to fail.
Command output:
Found 1 applicable basebackup
Basebackup Backup size Orig size Start time
---------------------------------------- ----------- ----------- --------------------
server-f-postgres-01/basebackup/2019-07-10_12-27_0.00000000.pghoard 13245 MB 35432 MB 2019-07-10T12:27:32Z
metadata: {'compression-algorithm': 'snappy', 'format': 'pghoard-bb-v2', 'original-file-size': '81920', 'host': 'server-f-postgres-01', 'end-time': '2019-07-10 14:33:12.657815+02:00', 'end-wal-segment': '000000010000001A0000004A', 'pg-version': '90518', 'start-wal-segment': '000000010000001A00000048', 'total-size-plain': '37153730560', 'total-size-enc': '13888641735'}
Selecting 'server-f-postgres-01/basebackup/2019-07-10_12-27_0.00000000.pghoard' for restore
2019-07-10 15:20:34,941%BasebackupFetcher MainThread ERROR Download stalled for 120.43377648199385 seconds, aborting downloaders
2019-07-10 15:22:35,674%BasebackupFetcher MainThread ERROR Download stalled for 120.44614975301374 seconds, aborting downloaders
2019-07-10 15:24:36,392%BasebackupFetcher MainThread ERROR Download stalled for 120.47685114300111 seconds, aborting downloaders
2019-07-10 15:24:36,612 BasebackupFetcher MainThread ERROR Download stalled despite retries, aborting
FATAL: RestoreError: Backup download/extraction failed with 1 errors
pghoard.conf
:
{
"backup_location": "./metadata",
"backup_sites": {
"server-f-postgres-01": {
"active_backup_mode": "pg_receivexlog",
"basebackup_mode": "local-tar",
"basebackup_chunks_in_progress": 5,
"basebackup_chunk_size": 2147483648,
"basebackup_hour": 5,
"basebackup_interval_hours": 24,
"basebackup_minute": 40,
"pg_data_directory": "/var/lib/pgsql/9.5/data",
"nodes": [
{
"host": "127.0.0.1",
"user": "postgres",
"password": "secret",
"port": 5432
}
],
"object_storage": {
"storage_type": "google",
"project_id": "postgres-dev",
"bucket_name": "test-pghoard"
}
}
}
}
Hi, I have the same problem in pghoard 2.1.0. Any tips to solve?
2020-10-16 11:00:09,131%BasebackupFetcher MainThread ERROR Download stalled for 120.13373475382105 seconds, aborting downloader
Thanks.
There shouldn't be any generic issue with this as we've done very large amount of restorations across all major cloud providers and haven't been seeing this. If this is reproducible then you should check out what's happening on network level.
Hi,
On the line https://github.com/aiven/pghoard/blob/master/pghoard/rohmu/object_storage/google.py#L60
# googleapiclient download performs some 3-4 times better with 50 MB chunk size than 5 MB chunk size;
# but decrypting/decompressing big chunks needs a lot of memory so use smaller chunks on systems with less
# than 2 GB RAM
DOWNLOAD_CHUNK_SIZE = 1024 * 1024 * 5 if get_total_memory() < 2048 else 1024 * 1024 * 50
UPLOAD_CHUNK_SIZE = 1024 * 1024 * 5
Debugging, including on a machine/network in the CGP itself, I realized that the problem occurs when a machine has> 2 GB of RAM, because enters the condition "if get_total_memory () <2048 else 1024 * 1024 * 50"
DOWNLOAD_CHUNK_SIZE = 1024 * 1024 * 5 if get_total_memory () <2048 else 1024 * 1024 * 50
That is, the problem occurs when DOWNLOAD_CHUNK_SIZE = 50MB
First I tested with DOWNLOAD_CHUNK_SIZE = 1024 * 1024 * 5 and the download was successful!
The maximum value that the download works is DOWNLOAD_CHUNK_SIZE = 1024 * 1024 * 25, that is, 25 MB
Is there an automated test that runs on a machine with> 2GB of RAM?
Cheers
Is there an automated test that runs on a machine with> 2GB of RAM?
Yes.
It would probably make sense to add an optional configuration parameter that can be used to set the chunk size. 50 MiB performs better than 5 MiB so it is preferable when download performance is important and as mentioned we haven't seen any issues with this but in general 50 MiB is fairly large chunk size and setting smaller one via config would be reasonable, especially if the machine is otherwise somehow memory constrained.