`fly pg backup list/restore` failed with deadline_exceeded sometimes
Please only report specific issues with flyctl behavior. Anything like a support request for your application should go to https://community.fly.io. More people watch that space and can help you faster!
Describe the bug
when we execute fly pg backup list, it sometimes failed with deadline_exceeded
❯ f pg backup list -a ***
Error: failed to exec on VM 2874262c353518: deadline_exceeded: Post "http://unix/v1/exec": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (Request ID: 01JJRPZ859TZM6BZ1A9RN4AZ5N-fra)
- Operating system
❯ sw_vers
ProductName: macOS
ProductVersion: 14.5
BuildVersion: 23F79
fly version
❯ f version
fly v0.3.70 darwin/arm64 Commit: 239bc529874fd0d24276eb3fdee0d79722ad0a34 BuildDate: 2025-01-28T18:39:08Z
** Paste your fly.toml
# fly.toml app configuration file generated for ***on 2023-11-17T10:14:15+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = "***"
primary_region = "fra"
kill_signal = "SIGTERM"
[build]
[deploy]
release_command = "/app/bin/migrate"
strategy = "bluegreen"
[env]
DNS_CLUSTER_QUERY = "***"
PHX_HOST = "***"
PORT = "8080"
PRIMARY_REGION = "fra"
RELEASE_COOKIE = "***"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = false
auto_start_machines = true
min_machines_running = 2
processes = ["app"]
[http_service.concurrency]
type = "connections"
hard_limit = 1000
soft_limit = 1000
[[http_service.checks]]
grace_period = "60s"
interval = "30s"
method = "GET"
timeout = "5s"
path = "/_healthy"
tls_skip_verify = false
** Command output: **
❯ f pg backup list -a ***
Error: failed to exec on VM 2874262c353518: deadline_exceeded: Post "http://unix/v1/exec": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (Request ID: 01JJRPZ859TZM6BZ1A9RN4AZ5N-fra)
fyi, Tigris and backup config
❯ f pg backup config show -a ***
ArchiveTimeout = 60s
RecoveryWindow = 30d
FullBackupFrequency = 1h
MinimumRedundancy = 3
flexctl backup list sets timeout 10 seconds and it's too tight
https://github.com/fly-apps/postgres-flex/blob/bb46120d4617bef3b7c4cb0a8e21998e37cf87d7/cmd/flexctl/backups.go#L150
when I exec barman-cloud-backup-list it took 144sec, can we increate timeout? or it'd be nice if we could specify timeout from CLI
root@2871961a5dd468:/# start_time=$(date +%s)
root@2871961a5dd468:/# barman-cloud-backup-list --cloud-provider aws-s3 --endpoint-url https://fly.storage.tigris.dev --profile barman s3://*** *** > /dev/null
root@2871961a5dd468:/# end_time=$(date +%s)
root@2871961a5dd468:/# echo "Time taken: $((end_time - start_time)) seconds"
Time taken: 144 seconds
same happened when we run fly pg backup restore
it still happens
❯ time f pg backup create -a ***
Error: failed to exec on VM 683d524b426098: deadline_exceeded: Post "http://unix/v1/exec": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (Request ID: 01JT1F9ZMWEER51TPERVJW47AD-ams)
real 0m11.430s
user 0m0.138s
sys 0m0.071s