flyctl icon indicating copy to clipboard operation
flyctl copied to clipboard

`fly pg backup list/restore` failed with deadline_exceeded sometimes

Open kubosuke opened this issue 11 months ago • 1 comments

Please only report specific issues with flyctl behavior. Anything like a support request for your application should go to https://community.fly.io. More people watch that space and can help you faster!

Describe the bug

when we execute fly pg backup list, it sometimes failed with deadline_exceeded

❯ f pg backup list -a ***
Error: failed to exec on VM 2874262c353518: deadline_exceeded: Post "http://unix/v1/exec": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (Request ID: 01JJRPZ859TZM6BZ1A9RN4AZ5N-fra)
  • Operating system
❯ sw_vers
ProductName:		macOS
ProductVersion:		14.5
BuildVersion:		23F79
  • fly version
❯ f version
fly v0.3.70 darwin/arm64 Commit: 239bc529874fd0d24276eb3fdee0d79722ad0a34 BuildDate: 2025-01-28T18:39:08Z

** Paste your fly.toml

# fly.toml app configuration file generated for ***on 2023-11-17T10:14:15+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = "***"
primary_region = "fra"
kill_signal = "SIGTERM"

[build]

[deploy]
  release_command = "/app/bin/migrate"
  strategy = "bluegreen"

[env]
  DNS_CLUSTER_QUERY = "***"
  PHX_HOST = "***"
  PORT = "8080"
  PRIMARY_REGION = "fra"
  RELEASE_COOKIE = "***"
[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = false
  auto_start_machines = true
  min_machines_running = 2
  processes = ["app"]
  [http_service.concurrency]
    type = "connections"
    hard_limit = 1000
    soft_limit = 1000
  [[http_service.checks]]
    grace_period = "60s"
    interval = "30s"
    method = "GET"
    timeout = "5s"
    path = "/_healthy"
    tls_skip_verify = false

** Command output: **

❯ f pg backup list -a ***
Error: failed to exec on VM 2874262c353518: deadline_exceeded: Post "http://unix/v1/exec": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (Request ID: 01JJRPZ859TZM6BZ1A9RN4AZ5N-fra)

fyi, Tigris and backup config

Image

❯ f pg backup config show -a ***
  ArchiveTimeout = 60s
  RecoveryWindow = 30d
  FullBackupFrequency = 1h
  MinimumRedundancy = 3

kubosuke avatar Jan 29 '25 09:01 kubosuke

flexctl backup list sets timeout 10 seconds and it's too tight https://github.com/fly-apps/postgres-flex/blob/bb46120d4617bef3b7c4cb0a8e21998e37cf87d7/cmd/flexctl/backups.go#L150

when I exec barman-cloud-backup-list it took 144sec, can we increate timeout? or it'd be nice if we could specify timeout from CLI

root@2871961a5dd468:/# start_time=$(date +%s)
root@2871961a5dd468:/# barman-cloud-backup-list --cloud-provider aws-s3 --endpoint-url https://fly.storage.tigris.dev --profile barman s3://*** *** > /dev/null
root@2871961a5dd468:/# end_time=$(date +%s)
root@2871961a5dd468:/# echo "Time taken: $((end_time - start_time)) seconds"
Time taken: 144 seconds

same happened when we run fly pg backup restore

kubosuke avatar Jan 30 '25 15:01 kubosuke

it still happens

❯ time f pg backup create -a ***
Error: failed to exec on VM 683d524b426098: deadline_exceeded: Post "http://unix/v1/exec": context deadline exceeded (Client.Timeout exceeded while awaiting headers) (Request ID: 01JT1F9ZMWEER51TPERVJW47AD-ams)

real	0m11.430s
user	0m0.138s
sys	0m0.071s

kubosuke avatar Apr 29 '25 19:04 kubosuke